Skip navigation.
Home

Large-scale clusters research - First HEPiX Large-Scale Cluster Computing Workshop

Alan Silverman
CERN

Dane Skow
Fermi National Accelerator Laboratory

High energy physics and other computing communities will, in the near term, deploy clusters consisting of 1,000s of processors, and serving hundreds to thousands of independent users. That development requires an expansion of reach in both dimensions (processors and users) by an order of magnitude from the capacity of current successful production facilities. The workshop's agenda addressed some of the challenges in building such large-scale cluster computing environments.

The workshop's primary goal was to address the needs of computational clusters serving next-generation high-energy physics (HENP) experiments (next 5-10 years). By extension, the attendees identified areas where some investment of money and effort is likely to be needed. Specifically, the workshop aimed to:

  • Determine what tools exist that can scale up to cluster sizes foreseen for the next generation of HENP experiments (several thousand nodes);
  • Compare and record experiences gained with such tools;
  • Produce a practical guide to all stages of planning, installing, building, and operating a large computing cluster in HENP;
  • Identify and connect groups with similar interest within HENP and the larger cluster computing community.

Computing experts with responsibility and/or experience of such large clusters were invited to the workshop. The clusters of interest were those equipping centers of the sizes of LHC (Large Hadron Collider) Tier 0 (thousands of nodes) or Tier 1 (at least 200-1000 nodes) as described in the MONARC (Models of Networked Analysis at Regional Centers for LHC Experiments) project. The 60 attendees came not only from various HENP sites worldwide, but also from other branches of science, including bio-medicine and various Grid projects, as well as from industry.

The attendees shared freely their experiences and ideas, and the proceedings were prepared from material collected by the conveners and offered by the attendees. The conveners, with the help of material offered by the attendees, are in the process of producing a Guide to Building and Operating a Large Cluster. The Guide is intended to describe all phases in the life of a cluster, and the tools used or planned to be used. This guide will be publicized via the Web, presented at appropriate meetings and conferences, and regularly kept up to date as more experience is gained. The plan is to hold a similar workshop in the Fall of 2002, and to present the initial version of the guide at that time. All the workshop material is available from the
Web.

Key challenges

The meeting began with an overview of the challenges facing HENP. Matthias Kasemann, head of the Computing Division at Fermilab, described Fermilab's current and near-term Scientific Program, including Fermilab's participation in CERN's future LHC program, notably in the CMS (Compact Muon Solenoid) experiment. He also described Fermilab's current and future computing needs for their Run II experiments, pointing out where clusters - or computing farms as they are sometimes popularly known - are used already. He noted that the overwhelming importance of data in HENP's current and future generations of experiments had prompted the interest in data grids. He posed some questions for the workshop to consider:

  1. Should or could a cluster emulate a mainframe?
  2. How much could HENP compute models be adjusted to make most efficient use of clusters?
  3. Where do clusters not make sense?
  4. What is the real total cost of ownership of clusters?
  5. Could we harness the unused CPU power of desktops?
  6. How to use clusters for high I/O applications?
  7. How to design clusters for high availability?

Wolfgang von Rueden, head of the Physics Data Processing group in CERN's Information Technology Division, presented the LHC computing needs. He described CERN's role in the project, displayed the relative event sizes and data rates expected from Fermilab RUN II and LHC experiments, and presented a table of their main characteristics. He pointed out the large increases in data expected and, consequently, the huge increase in computing power which must be installed and operated.

The other problem posed by modern HENP experiments is their geographical scope, with collaborators spread throughout the world requiring access to data and compute power. von Rueden noted that typical HENP computing is more appropriately characterized as High Throughput Computing as opposed to High Performance Computing.

The needs to exploit national resources and reduce dependence on links to CERN have produced a multi-layered computing model (MONARC). That model is based on a large central site to collect and store raw data (Tier 0, at CERN), and multiple tiers with data extracts and copies each performing different stages of physics analysis. Those tiers include, for example, the various National Computing Centers (Tier 1), and even individual users' desks (Tier 4).


MONARC RC topology

Figure 1.
Overview of the MONARC RC topology.

von Rueden concluded by showing where Grid computing will be applied. With a Grid infrastructure in place, one might be able to submit a job, and the Grid would find the most convenient places to run that job, and would therefore optimize the use of widely dispersed compute resources. It would also organize efficient access to data, by caching, migrating, or replicating it, as appropriate. The Grid would offer an authentication mechanism for accessing resources at various sites, and would expose a local site's resource allocation mechanism via common interfaces. In essence, the Grid infrastructure would run a submitted job, monitor that job's progress, and would attempt to recover from errors occurring during job execution.
Finally, the Grid would notify of a job's completion.

Error amplifier

The rest of the three days included formal presentations of clustering as seen by some large sites, such as CERN, Fermilab, and SLAC, and also from smaller sites without on-site accelerators of their own, such as NIKHEF in Amsterdam and CCIN2P3 in Lyon. However, the largest part of the workshop consisted of a series of interactive panel sessions, each seeded with questions and discussion topics, and each introduced by a handful of short talks. Full details of these sessions, including most of the overheads presented during the workshop, are available on the workshop Web site.

Many tools were highlighted during the workshop - some commercial, some developed locally, and some adopted from the open source community. In choosing whether to use commercial tools or develop one's own it should be noted that so-called "enterprise packages" are typically priced for commercial sites where downtime is expensive and has quantifiable cost. They usually have considerable initial installation and integration costs. But one must not forget the often high ongoing costs for home-built tools, as well as vulnerability to personnel loss/reallocation.

There were discussions on how various institutes and groups performed monitoring, resource allocation, system upgrades, problem debugging, and other tasks associated with running clusters. Some highlighted lessons they had learned and how they would improve a given procedure next time. One memorable quote from Chuck Boeheim of SLAC was that "a cluster is a very good error amplifier."

Different sites described their methods for installing, operating and administering their clusters. The "G word" - G standing for Grid, of course - cropped up often, but all agreed that it would need lots of work to implement something of general use. One of the panels described the three Grid projects of relevance to HENP, namely the European DataGrid project and the two US projects in this space - PPDG (Particle Physics Data Grid) and GriPhyN (Grid Physics Network).

A number of sites described how they access data. Within an individual experiment, a number of collaborations have worldwide "pseudo-grids" operational today. In this context one speaker referred to the existing SAM database for the D0 (D Zero) experiment at Fermilab. These already point toward issues of reliability, allocation, scalability, and optimization for the more general Grid.

The workshop ended with the delegates agreeing to reconvene in approximately 18 months. Returning to the questions posed at the start by Matthias Kasemann: It is clear that clusters have replaced mainframes in virtually all of the HENP world. However, cluster administration, in particular, is far from simple, and poses increasing problems as cluster sizes grow. In-house support costs must be balanced against bought-in solutions, not only for hardware and software, but also for operations and management. Finally, there are already several solutions, and a number of practical examples, of the use of desktops to increase overall available computing power.

Resource

CERN (European Organization for Nuclear Research)
Fermi National Accelerator Laboratory
Large Hadron Collider at CERN
The MONARC project
Fermilab's High Energy Physics Information Center
The Compact Muon Solenoid Experiment
The RUN II experiments
The D0 Experiment

y

y