You are here

Accelerated Computing: “Computational Accelerator” Term Revisited

January 7th, 2013

The Fifth International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2) brought together several researchers from academia working with accelerated computing technologies to discuss the future of computing accelerators with a theme on hardware/software divergence in accelerator computing.

An interesting question was brought up: What do we consider to be a computational accelerator? On one side it has been postulated that “Anything that is built to make a specific type of computation is an accelerator.”  Examples of such accelerators include a vector instruction unit found in many modern CPUs, the floating point units in IBM BG/P and Q, an H.264 media decoder as part of a processor die, and of course GPUs.  On the other hand, some panelists felt that an accelerator is the stand-alone part that is added to a general-purpose computer to help in performing general-purpose computations.  Examples of such accelerators, according to this definition, include GPUs, FPGAs, MIC boards, etc.  So, what do we really mean by a “computational accelerator?”

There are several closely related terms that are frequently used interchangeably when discussing speeding up computations.  Perhaps the most widely used term is “hardware accelerator,” which is the use of some specialized hardware to perform some computation faster than it can be done in software using a general-purpose microprocessor.  An H.264 media decoder fits this definition.  Another relevant term is a “co-processor,” which is a supplemental unit attached to the main processor and tasked with the execution of a specific function or instruction.  A floating-point unit used to be a co-processor.  Sometimes “offload engine” is used to describe a piece of hardware designated for a particular function.  And of course “heterogeneous computing” is a widely-used term that refers to using different types of processors, some of each are “accelerators,” to execute a single application.  In this scenario, some parts of the application are executed by one or another type of the processor.

“Computational accelerator” is a relatively new term that was introduced in the mid-2000s to describe the use of some specialized, but reprogrammable, hardware to improve performance of computationally-intensive software, particularly in the framework of scientific/engineering and high-performance computing.  Initially it has been introduced in the context of using field-programmable gate arrays (FPGAs) to speed up the execution of some parts of the code, namely computational kernels.  Such computational accelerators were envisioned as programmable co-processors, or hardware accelerators capable of performing relatively general-purpose computations.  But unlike co-processors and hardware accelerators with fixed functionality, they could be re-programmed at run-time.  A good example of an early system with such capabilities is the Cray XD1 which used an FPGA as a programmable co-processor.

Later the term “computational accelerator” became synonymous with the technology used in the Roadrunner supercomputer – an add-on board containing IBM’s PowerXCell 8i processor.  The processor itself is a heterogeneous multi-core design containing two types of cores.  But the way it was used in Roadrunner is what made it a computational accelerator.  Today AMD and NVIDIA GPUs and Intel’s MIC are the three leading computational accelerators.  While GPU-based accelerators have been around since 2007, MIC is a newcomer, but a very promising one.  Both are attached to the main CPU as add-on boards via PCIe interconnect.  Both contain many processor cores as well as separate-from-the-host memory hierarchies.  Both are reprogrammable on the fly and capable of performing a substantially larger number of floating-point operations compared to a modern microprocessor.

Lately both AMD and Intel have started to combine CPU cores with GPUs on a single die.  With this transition, the embedded GPUs are losing their status as computational accelerators.  Even though they are re-programmable, they are really becoming just hardware accelerators designed to speed up the graphics output.  Unlike discrete GPUs, at least in their current configuration the embedded GPUs are not attractive for general-purpose computations.  They are small and their memory bandwidth is not any better than that of the main CPU.

To conclude, I would like to propose the following definition: a computational accelerator is a reprogrammable, stand-alone processor capable of executing an arbitrarily complex, but functionally limited code, offering the benefit of faster execution as compared to the general-purpose processor.  The functional limitation comes from its inability to directly interact with some of the hardware resources that are typically accessible by the main processor, such as disk or network interface.  It is not a fixed hardware designed to accelerate the execution of a particular instruction (e.g., a vector unit) or a particular fixed function (e.g., H.264 media decoding). 

About the author

Dr. Kindratenko is a Senior Research Scientist at the National Center for Supercomputing Applications and a lecturer in the Department of Electrical and Computer Engineering at the University of Illinois.  He received the D.Sc. degree from the University of Antwerp, Belgium, in 1997 and graduated from the State Pedagogical University, Kirovograd, Ukraine, in 1993.  Dr. Kindratenko’s research interests include high-performance computing and special-purpose computing architectures.  He has been working with application scientists to implement scientific codes on acceleration-based computing systems focusing on the use of high-level languages and algorithm optimization techniques.  He is a Senior Member of IEEE and ACM.