Different Types of Clusters
The processing and analysis of the large amounts of data produced by
the NA49 experiment (about 50 TB) is done on several different clusters.
Each stage of the data processing cycle requires a different kind of cluster.
Although each cluster is built from commodity components they are all tuned
differently for their specific main tasks. The following types of clusters
are being used:
-
Batch Data Processing
Jobs on this cluster typically read raw experiment data from tape,
process the data and write the results back to tape. Each cluster node
runs an independent job and jobs are scheduled and managed by a batch scheduling
system like NQS or LSF. Nodes in this type of cluster need good CPU performance,
disks large enough to contain the input and output data and a good connectivity
to the mass storage (tape silo) system. Dual CPU SMP systems perform well
and allow easy doubling of the cluster's CPU power. Typical jobs run well
in 128 MB RAM (i.e. 256 MB for a dual CPU machine). Inter-node communication
performance is not critical since all jobs run indenpendently.
-
Interactive Data Analysis
This cluster is used to analyse and mine the data produced by the data
processing cluster. The main goal of an interactive cluster is to traverse
very large databases (several 100 GB's) as fast as possible.
The data analysis program typically runs in parallel on all nodes in the
cluster to achieve interactive response times. The nodes in this cluster
need to have excellent CPU performance, large memories (to cache as much
data as possible), lots of local high performance disk (to have fast access
to the data), high performance, low latency, switched, inter-node networking
(for efficient inter-process communication).
-
Monte Carlo Simulation
Monte Carlo programs are used to simulate detectors. Simulation
jobs typically create output data in a format equivalent to the raw data
produced by an detector. Each cluster node runs an independent simulation
job and jobs are scheduled and managed by a batch scheduling system (just
like on the Batch Data Processing cluster). Except for possibly smaller
local disk space the simulation cluster looks a lot like a data processing
cluster.
-
Workgroup Services
This cluster provides interactive login services for a large number
of users. The tasks vary from reading news to program development. Nodes
in this cluster typically consist of nodes recycled from the other clusters.
In this joint project we focus on the first three types of clusters.
Contact: Fons
Rademakers
Last update 26/7/98 by FR |
Copyright © 1998 Hewlett-Packard
& GSI
All rights reserved. |