The usage of graphics processors and heterogeneous computing architectures

Olga Abramova, Yulia Itkulova, Dmitry Maryin, Victor Malyshev, Elena Moiseeva, Nail Gumerov

Objective:

Development and implementation of heterogeneous algorithms for numerical simulation of large-scale problems on liquids microflows and molecular dynamics

Graphics Processing Unit NVIDIA Tesla C2070

Graphics processing units (GPUs) are used for scientific computing for about ten years (the rapid development of the computer technologies in recent years shows that we are at the stage of revolutionary changes in this area). Programming of the GPU for common scientific purposes was very time-consuming at the beginning. The emergence of the software (NVIDIA CUDA) in the mid-2000s, allowed one to program GPU by using languages such as C or Fortran. It made a qualitative leap in the usage of GPUs and "democratization" of parallel programming. At the present time GPU and cluster consisting of GPUs, have the best economic and environmental performance, for example, the number of operations per unit of energy consumed or the cost of equipment. Thus, it is the realization of the concept "real supercomputer should be small" and an alternative to expensive large clusters consisting of CPUs only.

Performance of Graphics Processing Units (GPU) and Central Processing Units (CPU)

The architectural features impose restrictions on algorithms that actually can be accelerated using GPU. For example, simple massive computing (simple batch processing of graphical information, matrix-vector multiplication, etc.) can be accelerated in the tens and hundreds of times. At the same time calculations that, for example, rely on permanent incoherent access to global memory can be accelerated only in a few times or cannot be accelerated at all.

Developing a "heterogeneous" algorithm can help to solve the problem of acceleration of sufficiently complex algorithm (e.g., Fast Multipole Method). By a heterogeneous algorithm we mean an algorithm splitted apart to exploit different architectures (e.g., multi-core CPU and GPU).  Heterogeneous algorithms have a number of undoubted advantages, such as full use of all the resources (in homogeneous algorithms, when only one processor’s architecture works, processors of another architecture are idle), and considered as one of the promising areas.

The heterogeneous algorithms developed and implemented in the laboratory for solving large-scale problems of molecular dynamics and microflows of liquids have no analogues in the world and demonstrate to be efficient for the systems consisting of several GPUs and multicore CPUs. The plans of the laboratory include the modifying of the algorithms for heterogeneous clusters containing many GPUs. To implement and test algorithms the laboratory has sufficient resources: each employee works on a computer equipped with a 12-core CPUs and powerful GPU NVIDIA Tesla C2050; also the laboratory has a high-performance cluster consisting of four computing nodes, equipped with four GPU NVIDIA Tesla C2075 / K20c each.