Difference between revisions of "Useful Commands"

From NEClusterWiki
Jump to navigation Jump to search
(obsolete commands)
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= Useful Commands =
+
=Generic=
 +
 
 +
[http://www.makeuseof.com/tag/an-a-z-of-linux-40-essential-commands-you-should-know/ 40 Essential Commands You Should Know]
 +
 
 +
=NEcluster specific=
 
* A script created by Nick Luciano that starts Scale jobs through [[TORQUE/Maui]] with a user defined delay between jobs.  Useful because it spreads out file server load.
 
* A script created by Nick Luciano that starts Scale jobs through [[TORQUE/Maui]] with a user defined delay between jobs.  Useful because it spreads out file server load.
  
 
<source lang="bash">
 
<source lang="bash">
 
#!/bin/bash
 
#!/bin/bash
 +
# The purpose of this script is to submit several jobs to the queue spaced out
 +
# in time so that they do not do bad things.  Bad things is in the eye of the
 +
# user, but one such bad thing would be overburdening the fileserver.  Under
 +
# some, perhaps most, secenarios it would not be necessary to submit jobs to
 +
# the queue spaced out over time because the scheduler decides if resources
 +
# (nodes, procs) are available.  But, there are some resources that are not
 +
# accounted for by the scheduler, such as the previously mentioned fileserver.
 +
#
 +
# If, for example, you wanted to submit several SCALE jobs simultaneously to
 +
# the queues on necluster, you may find that there are plenty of processors
 +
# available.  But, soon you may discover that many of your jobs have suddenly
 +
# crashed.  The reason for this may be that the fileserver 'nefiles' cannot
 +
# handle all the file IO that occurs during the SCALE initialization.  But,
 +
# should you submit the same jobs spaced out over say 15 minutes, you may find
 +
# that all is copacetic.  Spacing out a few jobs manually would be doable, but
 +
# tedious.  Spacing out several jobs manually is a waste of your prescious time.
 +
# This script accomplishes that task so that you can use your time more wisely.
  
 +
# One argument is required
 
if [ $# -ne 1 ]
 
if [ $# -ne 1 ]
 
then
 
then
Line 10: Line 32:
 
         exit 1
 
         exit 1
 
fi
 
fi
 +
 +
# The one argument is the name or name fragment of your pbs job scripts in or
 +
# beneath the directory you run the script from.  Wild cards are permitted.
 
nameString=$1
 
nameString=$1
  
 +
#***************************************************
 +
# Parameters that can be changed
 +
#***************************************************
 +
# number of minutes from now to start first job 0..59
 +
offSet=0
 +
# number of minutes between jobs 0..59
 +
deltaMins=15
 +
 +
#***************************************************
 +
# Don't change these
 +
#***************************************************
 
zero=0
 
zero=0
 
oneMinute=1
 
oneMinute=1
Line 19: Line 55:
 
tomorrow=2400
 
tomorrow=2400
 
midnight=0000
 
midnight=0000
# number of minutes from now to start first job 0..59
+
#***************************************************
offSet=0
 
# number of minutes between jobs 0..59
 
deltaMins=15
 
  
 +
#***************************************************
 +
# Get the current time and directory
 +
#***************************************************
 
thisHour=`date +%H00`
 
thisHour=`date +%H00`
 
thisMin=`date +%M`
 
thisMin=`date +%M`
 
startDir=`pwd`
 
startDir=`pwd`
  
 +
#***************************************************
 +
# Compute the start time
 +
#***************************************************
 
if [ "$offSet" -le "$zero" ]; then
 
if [ "$offSet" -le "$zero" ]; then
# offSet at least oneMinute so the first job doesn't queue until tomorrow.
+
# offSet at least oneMinute so the first will run today instead of this time tomorrow.
 
   offSet=oneMinute
 
   offSet=oneMinute
 
fi
 
fi
Line 37: Line 76:
 
   thisHour=$((thisHour + anHour))
 
   thisHour=$((thisHour + anHour))
 
fi
 
fi
 
 
startTime=$((thisHour + thisMin))
 
startTime=$((thisHour + thisMin))
  
for pathname in `find . -name "$1"`; do
+
#***************************************************
 +
# Find the pbs job scripts in the current directory or below
 +
#***************************************************
 +
fileNames=($(find . -name $(eval echo "$nameString")))
 +
for pathname in "${fileNames[@]}"; do
 
     dir=$(dirname ${pathname})
 
     dir=$(dirname ${pathname})
 
     file=$(basename ${pathname})
 
     file=$(basename ${pathname})
Line 46: Line 88:
 
     command=`printf "qsub -a %04d $file" $startTime`
 
     command=`printf "qsub -a %04d $file" $startTime`
 
     echo "Now executing: $command"
 
     echo "Now executing: $command"
 +
    #***************************************************
 +
    # Submit the jobs
 +
    #***************************************************
 
     $command
 
     $command
 
     cd $startDir
 
     cd $startDir
Line 61: Line 106:
 
</source>
 
</source>
 
<br/>
 
<br/>
 +
 +
* An abuse of bash to get a pretty printed list of queues for all nodes:
 +
 +
<source lang="bash">
 +
#!/bin/bash
 +
{ echo '--' ; pbsnodes -a | grep -B 3 properties ; } | tr "\\n" ' ' | perl -pi -e 's/\-\-/\n/g' | tr -s ' ' | cut -d ' ' -f 2,11 | tr ',' ' '
 +
</source>
 +
 +
= Other tips =
 +
 +
* Temperature monitoring: http://necluster.engr.utk.edu/temp
 +
 +
* Old temperature monitor: http://nec549362.engr.utk.edu/cgi-bin/temp.cgi
 +
  
 
=The following is obsolete - use the batch system [[TORQUE/Maui]]=
 
=The following is obsolete - use the batch system [[TORQUE/Maui]]=
Line 90: Line 149:
 
FindFreeNodes
 
FindFreeNodes
 
</source> <br>
 
</source> <br>
 
== Other tips ==
 
 
* Temperature monitoring: http://nec549362.engr.utk.edu/cgi-bin/temp.cgi
 

Latest revision as of 00:35, 13 August 2014

Generic

40 Essential Commands You Should Know

NEcluster specific

  • A script created by Nick Luciano that starts Scale jobs through TORQUE/Maui with a user defined delay between jobs. Useful because it spreads out file server load.
#!/bin/bash
# The purpose of this script is to submit several jobs to the queue spaced out
# in time so that they do not do bad things.  Bad things is in the eye of the
# user, but one such bad thing would be overburdening the fileserver.  Under
# some, perhaps most, secenarios it would not be necessary to submit jobs to
# the queue spaced out over time because the scheduler decides if resources
# (nodes, procs) are available.  But, there are some resources that are not
# accounted for by the scheduler, such as the previously mentioned fileserver.
#
# If, for example, you wanted to submit several SCALE jobs simultaneously to
# the queues on necluster, you may find that there are plenty of processors
# available.  But, soon you may discover that many of your jobs have suddenly
# crashed.  The reason for this may be that the fileserver 'nefiles' cannot
# handle all the file IO that occurs during the SCALE initialization.  But,
# should you submit the same jobs spaced out over say 15 minutes, you may find
# that all is copacetic.  Spacing out a few jobs manually would be doable, but
# tedious.  Spacing out several jobs manually is a waste of your prescious time.
# This script accomplishes that task so that you can use your time more wisely.

# One argument is required
if [ $# -ne 1 ]
then
	echo "File name argument expected"
        exit 1
fi

# The one argument is the name or name fragment of your pbs job scripts in or
# beneath the directory you run the script from.  Wild cards are permitted.
nameString=$1

#***************************************************
# Parameters that can be changed
#***************************************************
# number of minutes from now to start first job 0..59
offSet=0
# number of minutes between jobs 0..59
deltaMins=15

#***************************************************
# Don't change these
#***************************************************
zero=0
oneMinute=1
minsPerHour=60
lastMin=59
anHour=100
tomorrow=2400
midnight=0000
#***************************************************

#***************************************************
# Get the current time and directory
#***************************************************
thisHour=`date +%H00`
thisMin=`date +%M`
startDir=`pwd`

#***************************************************
# Compute the start time
#***************************************************
if [ "$offSet" -le "$zero" ]; then
# offSet at least oneMinute so the first will run today instead of this time tomorrow.
   offSet=oneMinute
fi
thisMin=$((thisMin + offSet))
if [ "$thisMin" -ge "$lastMin" ]; then
   thisMin=$((thisMinute-(lastMin + oneMinute)))
   thisHour=$((thisHour + anHour))
fi
startTime=$((thisHour + thisMin))

#***************************************************
# Find the pbs job scripts in the current directory or below
#***************************************************
fileNames=($(find . -name $(eval echo "$nameString")))
for pathname in "${fileNames[@]}"; do
    dir=$(dirname ${pathname})
    file=$(basename ${pathname})
    cd $dir
    command=`printf "qsub -a %04d $file" $startTime`
    echo "Now executing: $command"
    #***************************************************
    # Submit the jobs
    #***************************************************
    $command
    cd $startDir
    startTime=$((startTime + deltaMins))
    # thisHour+anHour is to assure no div by zero at midnight
    remainder=$((((startTime+anHour)%(thisHour+anHour)) - minsPerHour))
    if [ "$remainder" -ge "$zero"  ]; then
	thisHour=$((thisHour + anHour))
	if [ "$thisHour" -gt "$tomorrow" ]; then
		thisHour=midnight
	fi
	startTime=$((thisHour + remainder))
    fi
done


  • An abuse of bash to get a pretty printed list of queues for all nodes:
#!/bin/bash
{ echo '--' ; pbsnodes -a | grep -B 3 properties ; } | tr "\\n" ' ' | perl -pi -e 's/\-\-/\n/g' | tr -s ' ' | cut -d ' ' -f 2,11 | tr ',' ' '

Other tips


The following is obsolete - use the batch system TORQUE/Maui

  • To list processes you run on the cluster nodes, run this command on the head node:
ListMyProcesses


  • The following command uses gstat to get a list of nodes by load. It then sorts the list by load/free CPUs and connects you to the node with the most free CPUs. The format below is an alias that you can put in your .bashrc file if you want it to be automatically applied to your environment.
 alias fss='ssh `gstat -1a -i necluster|grep node|sort -gr -k2|sort -k13|sort -k11|head -n1|cut -f1 -d" "`'


  • Get cluster load information from Ganglia in a terminal:
gstat -p8649 -1a -i necluster


  • The above, add sum of user+system load and sort on the load sum. The least loaded nodes are shows first:
gstat -p8649 -1a -i necluster | grep node | awk '{print $11+$13"\t"$1;}' | sort -g


  • To list the unloaded nodes, run this command on the head node:
FindFreeNodes