Getting Started
Contents
Getting Started on NECluster - Windows
SSH Client
To connect to the cluster you will need an SSH client. The easiest one to use, in my opinion is PuTTY. You can either download a standalone executable, or use the installer to install everything.
Once you've installed PuTTY you can just run it. A dialog window will pop up asking for some information. In the Host Name box you enter necluster.engr.utk.edu
. Make sure the port is 22
and that SSH
is checked. If you have a X Window Server (discussed next) and wish to use it, you also have to go to the X11 Category and put a check mark in the box next to Enable X11 forwarding
. Note that on the first screen you can save your settings so that you don't have to type this in every time. Once done, click Open
and enter your user name and password when prompted.
X Server
If you want to use some of the GUI programs on the cluster, you will need to install an X Server on your machine. A nice freeware X Server is Xming. When downloading Xming, first install the package Xming
and then install the package Xming-fonts
. If you desire you can install Xming-mesa
instead of Xming
for additional graphics capabilities that probably won't be used over a network connection anyways.
Once Xming is installed, you can run it from the start menu. It may seem like nothing is running after you click it, but if you check the application area of your task bar, you should see the Xming icon.
Note that you can start Xming before or after you start PuTTY. As long as you forwarded your X connection it will work.
Getting Started on NECluster - Mac/Linux
If you're running Mac OS X or any version of Linux it's even easier to get on the cluster. You generally already have a SSH Client and X Server installed! To log on to the cluster open up a terminal window and type the command:
ssh -X -l user necluster.engr.utk.edu
OR
ssh -X user@necluster.engr.utk.edu
The -X
forwards the X connection. You can omit it if you don't plan on using any programs that use it. Like Windows, most terminal programs allow you to save sessions that you want to use regularly.
It is advisable to learn a bit about the powerful ssh command, here are some links to start: http://tychoish.com/rhizome/9-awesome-ssh-tricks/ http://www.mynitor.com/2010/08/07/the-ultimate-ssh-tricks-manual/
Also you can create file ~/.ssh/config
to tell how you want ssh to behave:
ForwardX11 yes ForwardAgent yes ForwardX11Trusted yes Host cluster HostName necluster.engr.utk.edu User <your_username> IdentityFile ~/.ssh/id_rsa.UTKNEcluster
First Time on the Cluster
Changing Password
The first time you're on the cluster the very first thing you will want to do is to change your password away from the temporary one that you were assigned. This is done by using the yppasswd
command:
user@necluster ~ $ yppasswd Changing NIS account information for user on nefiles. Please enter old password: Changing NIS password for user on nefiles. Please enter new password: Please retype new password:
The NIS password has been changed on nefiles.
Your password has been changed!
Getting on Other Nodes
When you first log on your prompt will be:
user@necluster:~$
This shows that you are on the head node. When you run cases you'll want to run them on one of the many compute nodes that you can find on Ganglia. They are named node# where # is the node number. The very first thing to do is create a public key file so you don't have to enter your password every time you want to connect to a compute node. To do this, follow the following steps:
user@necluster:~$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/user/.ssh/id_rsa): Press <Enter> Enter passphrase (empty for no passphrase): Press <Enter> Enter same passphrase again: Press <Enter> Your identification has been saved in /home/user/.ssh/id_rsa. Your public key has been saved in /home/user/.ssh/id_rsa.pub. The key fingerprint is: Stuff The key's randomart image is: Funny boxed picture user@necluster:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys user@necluster:~$ <done>
To connect to one of these nodes you can just SSH to it:
user@necluster:~$ ssh node15 Cluster MOTD Information user@node15:~$
If it is your first time on the node, you will have to verify the authenticity of the nodes SSH key. If you have a few minutes you can run a script to connect to every node in a list so you can just type yes about 30 times and then not have to worry about it.
This is the current incarnation of the script:
#!/bin/bash
for i in {2..31}
do
ssh node$i hostname
done
I have the script in my home directory, so instead of copying the script out yourself, you can just run it from my directory as follows:
user@necluster:~$ ~shart6/test_nodes
Checking how much are nodes loaded
Either use Ganglia web interface, or type in a terminal:
/opt/ganglia/bin/gstat -p8649 -1a -i necluster
- You can sum the user+system load and sort on the load sum. The least loaded nodes are shows first:
/opt/ganglia/bin/gstat -p8649 -1a -i necluster | grep node | awk '{print $11+$13"\t"$1;}' | sort -g
Please never run your code on the head node ("necluster") or any of the fileservers. Only use the machines which have *node* in their hostname.
Common Problems
Sometimes, when messing with a cluster, something will break and I'll have to regenerate a compute node's SSH key. When this happens you will get a long error message telling you this when you try to log into a node:
user@necluster:~$ ssh node# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that the RSA host key has just been changed. The fingerprint for the RSA key sent by the remote host is 93:a2:1b:1c:5f:3e:68:47:bf:79:56:52:f0:ec:03:6b. Please contact your system administrator. Add correct host key in /home/user/.ssh/known_hosts to get rid of this message. Offending key in /home/user/.ssh/known_hosts:377 RSA host key for node# has changed and you have requested strict checking. Host key verification failed.
The easiest way to fix this error message is to remove the relevant key so that when you connect to the node again you can add the new key to the file:
user@necluster:~$ ssh-keygen -R node# /home/user/.ssh/known_hosts updated. Original contents retained as /home/user/.ssh/known_hosts.old