Brian Luethke
John Mugler
Stephen L. Scott
Oak Ridge National Laboratory
Among the more complex tasks of system administration is upgrading a large number of machines in a cluster. C3 (Cluster Command & Control) Power Tools greatly simplify that task. The cpushimage command and Systemimager are used to manage system images across cluster nodes, and even across multiple clusters.
cpushimage enables an administrator to do several powerful operations that would have normally been too expensive to do before. One of the most useful aspects is the easy and relatively fast manipulation of images. That allows an administrator to build a custom image for a single user, roll that image out to one or more clusters, allow that user to run his applications, and then revert back to the default software configuration.
Providing a known good image is maintained, you may even give your users root access to rebuild the cluster as they see fit. Once they are done, you can easily revert to your known good image, thus effectively removing any changes they made. As Systemimager stores the node's image entirely on the file system, it is possible to keep images of each clusters head node on a different machine. That image, in turn, may contain images within it. You can use cpushimage revert to any backup images you have made.
Working with the subset of a cluster
Using C3's command line ranges, it is easy to test a new configuration on a subset of the cluster by pushing the new image to the specified node subset. For example, we have built a custom setup on a single node that a user has requested. Using the following Systemimager command we can retrieve that image:
getimage -golden-client=node0.csm.ornl.gov -image=genome_image
This command copies the full image (partition tables and file system) of node0.csm.ornl.gov to the machine where the command originated from, naming the retrieved image genome_image.
Using cpushimage we can distribute that image to the entire cluster, several clusters, or a subset of the clusters. To test first the configuration on a few nodes, an administrator may run the following command from the cluster's head node:
cpushimage --reboot :1-8 genome_image
In the above example, cpushimage is called with the --reboot option. That tells cpushimage to reboot the nodes after the image has been transferred.
The next command parameter is the machine definitions section. The format of that section follows the pattern: cluster_name:range. In the above example, cpushimage executes on nodes 1 through 8, inclusive, on the default cluster (the default cluster is the first cluster listed in C3's configuration file, and need only be dictated by A: on the command line). The range is based on position in the C3 configuration file (the first node in the list is at position 0, second at position 1, etc...). Nodes that are listed as offline do not participate in the command at runtime - thus node position can be maintained regardless of a node's availability.
For example, the we might use the following c3.conf file:
cluster torc: {
torc:node0
dead placeholder
node[1-64]
}
In this example, the cluster is named torc and has 64 nodes, plus one head node. The head node's external interface is torc and it's internal interface is named node0. Since we decided to move our first compute node as node1, and C3 zero indexes ranges, we have added a placeholder at the beginning of the compute node section (marked dead so it does not participate in the cluster), thus we have shifted the indexing to a 1 based index scheme. The last line defines 64 nodes.
Once this image has been tested thoroughly it can easily be pushed to the entire cluster with the following command:
cpush --reboot genome_image
In this command, --reboot is used once again. Since no machine definition section or cluster has been specified, it is assumed that all of the default cluster is the desired set, and once again genome_image is specified.
Configuring multi-clusters
While using C3, the above concept can easily be extended to Federated clusters, or a cluster of clusters definition. First, several clusters must be defined in the C3 configuration file (for a detailed explanation of the file format please see the C3 project's web page, or the c3.conf man page).
Next, specify the clusters to participate in the command. To extend the first example to a multi-cluster example, we would first, on each head node, prepare a genome_image as before. Then to distribute that image for a test to three clusters - torc, htorc, and xtorc - we issue the following command:
cpushimage --reboot torc:1-8 htorc:1-8 xtorc:1-16 genome_image
Notice the machine definitions section of this example command line: Each cluster in the list can specify a different number of nodes. In this case, torc and htorc are smaller than xtorc so the decision was made to test on a few more nodes in xtorc. One important thing to note is that the specified image must reside on each individual clusters head node, not the machine the command is issued from, because Systemimager does not support nested images. Systemimager does not allow an administrator to store a node's image locally, push that image to a headnode, and then to a compute node.
Using C3's multi-cluster architecture enables a system administrator to easily set global policies across all clusters under their control, and to maintain specific subsets of those policies. It is important to note that C3 functionally scales from individual workstations to the Grid. Potential policy actions include user management, disk management, and controlling services.
Managing users
Another time consuming action a system administrator is adding and removing user accounts. Maintaining NIS may not be feasible for many cluster installation as the operational overhead can be quite high. Using a combination of techniques in C3, it is possible to easily add and remove users from every cluster controlled by an administrator.
A tool contributed to C3 is an add_user script. Scripts included in C3 distribution's contrib section of C3 depend greatly on the software/hardware setup of each individual site. They are intended as a model for administrators to modify. The add_user script functions by first getting the username and group to add from the command line as follows:
add_user sgrundy users
Using the standard Linux useradd command adds the
user to the head node and sets the user's
password with the Linux passwd command. Following this, the script uses cpush, C3's file scatter operation, to synchronize the specified file across an entire cluster to distribute both the password and password shadow files. Next, the user's ssh keys are created in their home directory. All file permissions and ownerships are then set correctly for the user.
In the following example, it is assumed that /home is nfs mounted on the cluster, but no other directories are. This script can save significant amounts of time even if only administrating a single cluster. It has proven to be one of the most useful C3 scripts when adding a user to many clusters at once. Using C3's general cluster exec() command - cexec - one can easily add the user to as many clusters as the administrator has access to, for example:
cexecs --head torc: htorc: xtorc: /sbin/add_user sgrundy users
First, note that the command actually run was cexecs, not cexec. That is because the add_user script needs input from stdin (from the passwd call). cexecs is capable of taking standard input, as it is a serialized version of cexec.
Normally all C3 commands run multi-threaded. However, multi-threaded codes interleave all prompts and output, Thus a serial mode of cexec is needed. Serial mode is also useful when a deterministic execution model is required. The option --head specifies that the command is to run only on the head node of the clusters.
When the --head option is not specified, the command runs only on the cluster nodes. Thus, if you want both the cluster nodes and the head node to participate in a command, you must specify two commands to be run, one with the --head option and one without.
As you can see from the above example it is just as easy to add a user to a single machine as it is for multiple machines - simply specify more than one cluster. Another thing to note is that the above command does not assume that you are on any particular machine: The command could be run from any of the head nodes of the above cluster, or even from a laptop from your home or your desktop. The only C3 stipulation is that the machine where the command originates be capable of making a connection to the head node of the specified clusters.
C3 only communicates with the head node on a remote cluster, using that node as a gateway. Thus, a cluster with an exposed head node and the compute nodes on a private network is capable of being remotely administrated using C3. C3 may access remote clusters in two ways: 1) direct remote, when the machine that the C3 command is executed on knows the layout of the remote cluster via the local configuration file; and 2) indirect remote, when the machine only knows the head node of the remote cluster, the architecture of the cluster is unknown by the local machine and is determined by the configuration file on the remote head node. The different cluster communication patterns are illustrated in Figure 1.
Figure 1: Cluster communication patterns
Other C3 tools
C3 contains several other tools, such as cluster remove, file scatter, file gather, cluster kill, and a cluster shutdown. In particular the file scatter (cpush) and file gather (cget) have found extensive use on the various ORNL clusters.
cpush can be used for many different tasks such as pushing out new configuration files, syncing password files, distributing binaries, etc. cget is valuable when debugging a cluster: An administrator may easily retrieve log files from multiple clusters to her desktop for manipulation or record keeping. All the tools with the exception of cpushimage and cshutdown are also available to users to manage their workspaces. These two commands are restricted to users with root access privileges.
The C2G user interface
C2G provides a simple-to-use GUI interface for the C3 Power Tools. C2G is designed as a remote, stand alone application capable of interacting with remote clusters. C2G has two main panels: the panel in the middle displays all the clusters in the configuration files. The panel on the bottom shows the output of the function that was invoked. There is also a menu bar across the top, which contains a few basic things that C2G can do on its own, such as run commands via ssh on remote computers or configure itself. Directly underneath these options, the modules that a user has in their ~user/.c2g/modules directory appear. Modules are simply Python/TK menu button widgets constructed in a form that C2G can load. The modules can be run independently from the base system which is a good way to test the modules. The C2G base system loads these Python sources at runtime and executes them.
C2G installs the base system into /opt/c2g/ and the individual modules are placed in the user's directory. All of the basic GUI resides in /opt/c2g, as it is a common set of files that all users will employ. Additionally, each user must have a ~user/.c2g/ directory in their directory structure. When the user invokes C2G, anything found in ~user/.c2g/modules is loaded dynamically into the system. Additionally, a configuration directory exists in .c2g/config for the purpose of specifying clusters and nodes. That allows each user account to have their own set of modules and cluster configurations.
Currently, each cluster head node is named in ~user/.c2g/config/clusters. There must be a file for each cluster head node named in ~user/.c2g/config/clusters. Each file contains a listing of the nodes that the cluster contains. There is no fixed limit to the amount of clusters/nodes one may put in the system, however practical constraints suggest a limit of around 64-96 compute nodes before the GUI gets too cumbersome. Interestingly, if the clusters are not expanded in the GUI, this means around 64-96 head nodes can be displayed reasonably by C2G. The scalability of GUI's to handle large clusters is an active research area. One solution may be to segment clusters into multiple host, non-divisible units and not segment down to individual hosts. This is one way that C2G may represent extremely large clusters.
The basic C2G GUI system is very simple. It is designed purposely to allow for maximum flexibility on the part of the module authors. C2G offers basic display functionality for those applications that do not need a complex GUI tool. In the event a complex display is needed that C2G can not handle, the module itself must provide that option. The only output facility that C2G offers is a Text widget that displays STDOUT and STDERR on a Unix system.
C2G gives the module writer a configuration system that allows a user to specify, via mouse clicks, the individual nodes on which to run a program. C2G can also specify entire clusters via mouse clicks, or subsets of nodes from multiple clusters. That makes pushing a file or running a job on a subset of the nodes in your working cluster list very user friendly. C2G also offers the ability to load individual modules at runtime, thus allowing a user to only use those modules that benefit her. In this way, the system is extensible for both system administrators and users. That extensibility perhaps one of the most significant benefits of the C2G.
The C2G's capabilities
With no modules installed, C2G of itself can perform the following simple tasks: It can upload files to cluster head nodes via secure copy (scp), spawn a working ssh session for a head node or any node, or run a simple add-on script.
C2G exploits modules to increase its functionality. The C3 module is currently the only supported module for C2G, and provides a proof-of-concept implementation for others to follow. In the works are several other modules including one for the SciDAC:SSS version of MAUI, and a monitoring module that will make use of the Ganglia Monitoring System (gmond) (see Resources).
Currently, C2G's C3 module can perform three common C3 tasks:
- A file "push" or scatter operation via C2G allows the user to select, via a mouse click, both the nodes to participate as well as the files to be pushed. The push action may be used to distribute both executables as well as other files. C3 is then invoked to effect the file scatter operation.
- The inverse of the file scatter is the file-gather operation. C2G may similarly invoke the file-gather, or "get," operation. The "Get" is useful for those applications that produce local cluster node data that must be retrieved by the head node.
cexecthe basic parallel exec() of C3 is supported to enable one to invoke any command in parallel across one or more clusters.
To use the system, first the user selects the clusters and/or nodes to run a command on from the bottom GUI panel:
Figure 2: Selecting clusters and nodes in the C2G interface
Then, the user must click on the C3 menu to display the C3 commands:
Figure 3: The C2G interface displays C3 commands
Invoking one of the commands causes various input boxes to show. These boxes must be filled in with the appropriate information. When everything is ready, the user selects the "run" button to perform the action:
Figure 4: Performing an action through C2G
The system is designed to be simple and friendly for users that do not wish to use a command line. Most of the output is displayed in the Text field in the middle of the GUI, with a few pop-up boxes to help out. By default, output is routed to the text area:
Figure 5: Text-based output in C2G
We earlier discussed the ability to script C3 commands to perform tasks on a cluster. C2G expands on this notion by having an adduser command as an option in the C3 module that invokes the customized add_user script. Although the options are somewhat limited in the GUI, this user interface capability makes adding users quick and easy, as no complex commandline must be generated by the user. In this way, C2G further simplifies the use of C3 commands.
C2G is designed to run on a users desktop, however C2G will also run directly on the head node. Currently, C2G and the C3 module are coded using Python. Requirements for using C2G with the C3 module include Python2 with the Tkinter module, SSH, and C3. C2G currently runs on multiple versions of RedHat Linux, and should run on most any version of Unix. We are considering porting C2G to other operating systems. Additional work for C2G includes improving the look of the GUI itself in addition to dealing with the scalability issue of representing 100's or 1000's of nodes in a GUI format.
Summary
When combined together, the C3 Power Tools and C2G provide a very flexible, powerful, and easy-to-use set of tools for the administration and general use of network computing architectures in any combination of single workstations, workstation pools, clusters, federated clusters, and the grid. C3 is also included as one of the core components in the OSCAR cluster software stack (see Resources). At the time if this writing, OSCAR has had more than 80000 downloads. Thus, C3 may be found on many clusters today. The public release for C2G is planned for March 2003 which will coincide with an OSCAR package release.
ACKNOWLEDGMENT:
Research supported by the Mathematics, Information, and Computational Sciences Office, Office of Advanced Scientific Computing Research, Office of Science, U. S. Department of Energy, under contract No. DE-AC05-00OR22725 with UT-Battelle, LLC.
Resource:
Oak Ridge National Laboratory. Project C3: Cluster Command and Control(C3) home page
Brian Luethke, Thomas Naughton, and Stephen Scott, C3 Power Tools: The Next Generation..., Proceedings of the Austrian-Hungarian Workshop on Distributed and Parallel Systems (DAPSYS 2002), Linz, Austria, September-October 2002
System Installation Suite Project