- Many of today’s scientific achievements would not be possible without supercomputers.
- The DEEP project has been working to establish improved architectures for the next generation of ‘exascale’ supercomputers.
- The DEEP project recently came to an end, but has been succeeded by a new project called ‘DEEP-ER’
- The researchers behind these projects have created a prototype system and have been testing it with real-world HPC applications.
Many of today’s scientific achievements would not be possible without supercomputers — just ask any researcher, from physicists to geologists to life scientists. With science rapidly advancing, high-performance computing (HPC) needs to evolve, too. After all, for many scientific problems, today’s supercomputers are not fast enough, or they are not well adapted for dealing with the complexity of the problem at hand.
This is why the HPC community is aiming for the next generation of supercomputers by the mid-2020s, so-called ‘exascale’ computers capable of 1018 floating point operations per second (FLOPS). Despite the clear trend towards heterogeneous architectures for future machines that make use of accelerator processors, it is not clear yet how exactly exascale architectures will look.
One of the biggest challenges is to find a way to deal with Amdahl’s law. In its simplest form, this law states that the potential speedup of a parallelized algorithm is limited by the portion that must be performed sequentially.
The keyword here is ‘parallelized’: scientific simulations need to be extremely well parallelized in order to make efficient use of supercomputing resources. Those parts that cannot be parallelized limit the performance.
Up to now, the answer was to have separate systems: massively parallel systems for applications that are highly scalable (like many physics simulation codes, for instance) and commodity ‘clusters’ for those that aren’t. However, with the landscape of HPC users rapidly changing, data centers are confronted with a more diverse user-base that brings even more diverse simulation codes.
Taking a DEEP look at the challenge
DEEP, a European-funded exascale research project, proposes a way out of this dilemma. It puts forward the so-called ‘cluster-booster’ approach, an innovative effort in heterogeneous computing that basically merges the hitherto separate lines of massively parallel and commodity cluster systems, putting them into a single system.
Code segments that can only be parallelized up to a limited concurrency level stay on the cluster. This part of the hardware is equipped with general-purpose processor cores that can handle more complex computations. The highly parallelizable parts of the simulation are to run on the booster. This is equipped with accelerator processor cards — Intel Xeon PhiTM in the case of DEEP — and enables very fast and highly parallelized computation of specific tasks with the highly beneficial side effect of also being tremendously energy-efficient.
Such a cluster-booster architecture is perfectly suited for simulation applications that have intrinsically different code structures or combine different complex models, such as multi-physics or multi-scale simulations. The system allows for dynamic association of different kinds of computing resources, so as to best match workload needs. This opens up new avenues for the architecture of efficient next-generation HPC systems.
From theory to prototype
DEEP is not only a theoretical concept. Within less than four years, the project has created a fully functioning prototype system based on substantial technological innovation. This is capable of 500 teraFLOPS peak performance, which is rather enormous for a research project of this kind. Joining a commodity cluster system, the European hardware team designed and constructed the entire DEEP booster from scratch.
Currently the consortium is already working on a second prototype addressing further exascale challenges in the follow-up project DEEP-ER.
Since the hardware architecture is admittedly relatively complex, the project team has also developed a specific DEEP system software stack. The message-passing-interface (MPI) programming paradigm implemented with ParaStation MPI, in combination with an improved version of the OmpSs task-based programming environment, ensures ease of use for application programmers: they simply need to request the necessary resources without bothering about the underlying hardware architecture.
The rest is done transparently and dynamically by the software stack. Relying on the MPI paradigm here was a strategic decision, since MPI is most widely used within the HPC community — one more reason why this system is beneficial for a diverse user base.
Six real-world simulation applications from important scientific and engineering fields like human brain simulation, seismic imaging, and climate research were selected in order to investigate and demonstrate the benefits of combining hardware, system software, and the programming model in this manner. During the project, these applications have been highly optimized, and have driven co-design leading to the final realization of hardware and software in the project. They have also served to identify the main features of applications that most benefit from the DEEP concept.
Even though computer hardware undeniably does have a very short lifespan, meaning the current system will soon be outdated from a component point of view, the concept itself will significantly influence the development of future supercomputers. Most importantly, it puts into focus the requirements of a diverse HPC user group.