David Lifka
Cornell Theory Center
The Cornell Theory Center (CTC) is an interdisciplinary research support center at Cornell University with offices in Ithaca, New York, and New York City. Our primary mission is to provide a leading edge computational resource for the Cornell research community. During the past 15 years, we have formed strategic technical partnerships with companies such as Dell, IBM, SGI, Intel, and others to provide state-of-the-art hardware and software to our user community.
In May of 2000, CTC made a strategic, and somewhat controversial, decision to initiate a collaboration with Microsoft, designing a production computing facility that operates entirely within the Windows environment. The strategic factors that motivated us to make this change were reducing the size of our systems administration staff; lowering our total cost of ownership (TCO); improving the ease of use and ease of management of our HPC systems; and, expanding the accessibility of HPC.
Today the Cornell user community consumes on average 60,000 wall-clock hours of computer time per week on our resources, which, combined, have a performance capability of approximately two teraflops, based on High-Performance Linpack (HPL) results. This is more capability than we ever had, and continues to grow every year.
More Efficient HPC
This new direction for HPC has dramatically reduced our TCO, allowing us to redirect our resources from staff devoted to keeping the hardware running smoothly to staff devoted to assisting researchers. We have been able to put more emphasis on ensuring that our users are productive in their research, and on testing and developing new programming strategies.
When we moved to an all Windows environment, we leveraged the same systems staff to maintain a single security domain, Active Directory, not only for our HPC servers, but also for our infrastructure machines, desktops, and laptops. CTC has approximately three full-time employees who maintain nearly 1000 servers housed in two locations. Providing Windows-based HPC and visualization as a campus resource also brought new classes of users to HPC. We now provide HPC to the Johnson Graduate School of Management, the Cornell Institute for Social and Economic Research, architectural researchers and students, and other groups on campus that previously did not consider HPC an accessible tool.
Standardizing on one operating system further improved our TCO by eliminating the need for two desktop workstations on our staff members' desks. While staff members preferred handling day-to-day issues on Windows machines - largely because of the friendly interface, - technical staff did not believe that the Windows machines were sophisticated and powerful enough to handle HPC problems; however, the performance of Intel-based machines has caught, and frequently surpassed, that of the more traditional HPC RISC workstations.
In addition, the vast number of applications and development tools available for the Windows platform offers users tremendous advantages and opportunities. Having a laptop or desktop with the same operating system, development tools, and third party applications as the HPC resources reduces the complexity and learning curve for HPC application development. As a result, all CTC staff members now have one high-end laptop for all of their computing needs. Making their work environment portable allows progress to be made even when disconnected from the network, which improves productivity and, therefore, TCO. Consultants who are focused on high-end performance tuning and optimization typically have dual processor Xeon based machines as well, for long-running simulations and tests.
Toward a Seamless HPC Environment
People who develop tools for high-performance computing today are stuck in a rut. For the past 15 years, computer scientists have focused on tweaking cryptic scheduling systems and complex parallel programming libraries. Although great work has been done, the evolution of tools and software for HPC has slowed down in recent years.
As a result, a typical HPC system user is not only a sophisticated research scientists and expert in his or her own field, but must also be a proficient computer or computational scientists. These are clearly special people. When people can go to their banks' Web sites and determine their personal financial risk for their retirement portfolios on demand without having to know in which programming language the code is written, on what operating system that code runs, or how many machines are needed to provide a reasonable response time, we will have achieved seamless HPC.
When HPC becomes that easy, it will create a volume market. Professor Thomas Coleman, Director of CTC, and his research group, have already developed solutions for the financial services industry that ensure that more people have access to the benefits of HPC. Put in the hands of more researchers, this type of HPC will result in breakthroughs for science and society.
Web services in HPC
Web services offer an innovative tool for seamless HPC environments, providing all the utility that grids have promised for years, in addition to other key features. Web services are based on open standards, SOAP, XML, and HTTP, ensuring that they remain agnostic to the tools and languages used to develop or consume Web services, and to the types and locations of the machines that serve them. Microsoft and IBM have developed applications demonstrating just such capabilities.
The grid community has also embraced Web services, as is evident by their recent OGSA and OGSI efforts and the convergence towards the use of Web services for core grid technology. Web services also offer another "dream-come-true" that computer scientists have sought for the past 15 years: Self-documenting code. This obviously is a big win for code reuse and collaboration.
Today, CTC offers its users customized Web sites that allow researchers to authenticate and describe the problem they need to solve via Web pages. Researchers are asked only questions about the research problems they are trying to solve, not questions about HPC resource use. The answers, provided via the Web site, are used to launch an appropriate job to solve the problem. Results are provided to the researchers either in the form of an email or a Web page with downloadable data.
The required jobs are started by Web services behind the scenes, and may invoke one or more Web services on various servers or workstations at different locations. The jobs can also be submitted to a cluster scheduling system to solve tightly coupled problems on dedicated HPC cluster nodes.
The Computational Biology Service Unit at CTC is a great example of the use of Web services in HPC (see Resources). This system for solving research problems finally allows researchers to focus on their research, not on the HPC systems they are using. The complexity of the computing systems and programming environments is transparent to them. Most researchers agree: They don't care how they do the computing, they just want to get results. The easier, the better.

Figure 1:
Browser-based HPC job submission. © Cornell Theory Center
Adaptive Computing with Web Services
Web services can also provide support for grid computing, and are particularly useful for the task-farming, cycle-scavenging application, which is the dominant type of grid application today. Recently, grid applications have moved beyond cycle scavenging, and toward industrial strength applications. An example of that trend is the Adaptive Software Project, an NSF-funded initiative, led by Professors Keshav Pingali and Anthony Ingraffea of the Computational Materials Institute at CTC (see Resources).
One of the goals of the Adaptive Software Project is to provide a geographically distributed system where each of the collaborating teams provides a part of the solution, often requiring a tightly coupled application to be run on a local cluster or parallel resource. The trick is to put all these components together seamlessly so that a researcher or graduate students at any of the collaborating institutes can easily get a beginning-to-end solution without having to understand the locations, software, operating systems involved, or the order in which each part of the solution needs to be invoked.
The system developed uses Web services not only to orchestrate the flow of the solution from beginning to end, but also to provide a mechanism to federate all the collaborators' computing resources without having to coordinate user account management between the various institutions.
Users of the system obtain x509 certificates like those available from Verisign. CTC users are able to get them from a local Microsoft Windows Server 2003 acting as a certificate server. Users register their certificate with their local site, and are entered into a database that a Web service queries to authenticate that user later. When a job is ready to use the resource at one of the sites, it invokes a Web service with the necessary parameters and the appropriate x509 certificate. The Web service validates the submitting user's certificate against the local database, and if that certificate is valid, submits an appropriate job to the local cluster resource to solve that portion of the problem.
To prevent administrators at the various sites from having to coordinate their local certificate databases with all of the collaborators, a Security Token Server is used. This server allows administrators to define what other certificate databases they trust. A failed lookup at one site causes the Security Token Server to check other sites to see if that certificate is valid at a collaborator's site. If so, the user is authenticated at the trusting site. Using Web services, the ASP team has seamlessly combined heterogeneous computing resources and data sources in an industrial strength grid application by providing a secure federation and an interoperable mechanism to invoke jobs at each site.
Working with Petabyte Data Sets
Will Web services, combined with the competitive advantages of Windows-based HPC that allow users to seamlessly leverage HPC resources from their desktops, render supercomputing centers obsolete? Jim Gray, Gordon Bell, and George Spix (all with Mircosoft Research), have predicted that centers of the future must shift their focus towards serving huge amounts of data. As users have better and more sophisticated mechanisms to solve their computational problems, they will produce and consume much more data. While those same users can often afford to purchase machines to support their computational needs, supporting their data needs will become increasingly difficult.
Consider the international astronomy research community that wants to study more than two petabytes of data per year. Maintaining a local copy of that data is not affordable, and the researchers must be able to access that data efficiently from all over the world. Supercomputing centers can meet this new need by offering database technology served by Web services. Web Services can provide interfaces accessible to any platform so that astronomers can specify the data parameters in which they are interested.
The Web service can authenticate a user, and employ
resource-intensive searching or data mining techniques to find only the data the astronomers are looking for and return it to them in the form of XML. Jim Gray has worked with the Sloan Digital Sky Survey team to solve such a problem using SQL Server and Web services. (See Resources.) He has also helped Professors Johannes Gehrke (Computer Science) and Jim Cordes (Astronomy), of Cornell, to use a similar strategy for the Arecibo Observatory. (see Resources) CTC has recently started several other similar research collaborations.

Figure 2:
The Java-based Mirage application can consume Web services data from the Virtual Observatory. © Johns Hopkins University
Summary
CTC has been extremely successful providing a production quality computational resource for Cornell with Microsoft Windows. CTC's partnership with Microsoft has allowed us to look at new and better ways of doing HPC, and make HPC more accessible to more users. Data intensive computing, seamless HPC, and sophisticated resource management and scheduling tools to support them are the primary technologies on which we are currently focused. The benefits of this new direction in HPC are illustrated every day by Cornell researchers.
Resource:
Computational Biology Service Unit at the Cornell Theory Center
The Cornell Computational Materials Institute
ITR/ACS Adaptive Software Project
Home page of the Virtual Observatory Web services
The Arecibo Observatory Home Page
The Cornell Theory Center Windows clusters resources page
Daytona Leather Replica
Daytona Leather Replica Rolex