The Aurora supercomputer at Argonne National Laboratory recently achieved exascale performance, a milestone that dramatically enhances computational capabilities for scientific research. Installed in June 2023, Aurora is now recognized as the fastest supercomputer in the world. This landmark achievement signals a transformative era in high-performance computing, with far-reaching implications for fields such as climate modeling, cancer research, and green energy solutions.
Exascale computing refers to the ability to perform at least one quintillion calculations per second. This level of performance equips researchers with the tools necessary to address some of our most complex scientific challenges. By harnessing this extraordinary processing power, Aurora elevates the accuracy, speed, and scope of scientific work compared to previous generations of supercomputers.
To better understand the significance of this development, The Innovation Platform spoke with Mike Papka, Director of the Argonne Leadership Computing Facility and Deputy Associate Laboratory Director of Computing, Environment, and Life Sciences at Argonne National Laboratory.
Why Is Aurora’s Achievement So Significant?
Aurora’s exascale computing capability marks a significant leap in computational power. It allows for the integration of diverse scientific tasks, ranging from traditional modeling and simulation to data-intensive workflows and artificial intelligence (AI) applications in one cohesive system. Aurora’s unique architecture, which combines powerful CPUs and GPUs, enables it to tackle intricate problems across multiple domains including climate modeling, materials science, and energy research.
Papka explained, “This unprecedented computational capability allows scientists to conduct research previously deemed impossible due to resource limitations.”
Key Technological Advancements
The Aurora supercomputer’s exascale status is primarily due to several groundbreaking technological innovations. High-bandwidth memory, cutting-edge GPUs, and a new interconnect system called Slingshot 11 have been instrumental in achieving this milestone.
The Slingshot 11 network provides nearly double the endpoint connections of any other large-scale system currently in operation. This setup ensures that Aurora’s more than 10,000 nodes can transmit vast amounts of data seamlessly, which is critical for maintaining high performance under intensive computational loads. With its top rankings in AI and traditional computing tasks, Aurora is well-positioned to lead the field.
Accelerating AI and Machine Learning
Aurora’s exascale computing power significantly enhances advancements in AI and machine learning. With its extensive memory capacity and numerous GPUs, Aurora can train large AI models encompassing trillions of parameters. Papka pointed out that even during the testing phases, Aurora demonstrated remarkable results in mixed-precision calculations vital for AI training workloads.
This capability will allow researchers to manage massive datasets effectively and create more advanced models, propelling breakthroughs in various scientific disciplines.
Expanding Scientific Research Possibilities
Although Aurora is not yet operating at full capacity, it has already begun running real-world codes through projects such as the Argonne Leadership Computing Facility’s (ALCF) Early Science Program and the Exascale Computing Project. These initial applications cover critical areas like energy science, cancer research, and cosmology, and have already produced significant scientific results.
The advanced technology embedded in Aurora facilitates more detailed simulations and complex computations. It opens up new avenues for discoveries, particularly in energy science, which is crucial for tackling global energy challenges.
Addressing Development Challenges
The roadmap to Aurora’s successful deployment was not without obstacles. Papka noted that delays stemmed from vendor decisions and supply chain issues exacerbated by the pandemic. These challenges highlighted the need for flexible acquisition strategies. The rigid models that were previously used became less viable in an environment where technology evolves so rapidly.
To navigate these delays effectively, remarkable systems such as the Polaris supercomputer were rolled out to continue supporting scientific work, demonstrating the importance of adaptable strategies in research.
Data Management and Energy Efficiency
Handling the immense data that Aurora generates involves sophisticated systems like the Slingshot interconnect and a custom filesystem known as DAOS (Distributed Asynchronous Object Store). This high-performance storage system is integrated within the Global Filesystem environment, enabling efficient data management and storage across Aurora’s extensive computation fabric.
Moreover, Aurora prioritizes energy efficiency, utilizing innovative designs like water cooling systems instead of traditional air cooling methods. The overall architectural layout is engineered to minimize energy loss. These advancements paint a promising picture for future supercomputing projects in terms of their environmental impacts.
Collaborative Development Efforts
The Aurora project’s success is largely attributed to extensive collaboration among various organizations, including partnerships with industry giants like Intel and Hewlett Packard Enterprise (HPE). Close collaborations with sister facilities, such as the Oak Ridge Leadership Computing Facility (OLCF) and the National Energy Research Scientific Computing Center (NERSC), helped optimize development and deployment processes. Additionally, engagement with the Department of Energy’s Exascale Computing Project supported the evolution of exascale-ready tools and applications.
Looking Ahead
Aurora is designed to play a pivotal role in the next wave of exascale supercomputers, facilitating new scientific explorations and breakthroughs. The long-term vision is to enhance AI-enabled workflows and models, with targets set on clean energy research, drug discovery, and deeper insights into the universe.
Plans are underway for a subsequent system, Helios, which will incorporate lessons learned from the Aurora project to drive future advancements. This commitment to innovation ensures that computing capabilities will continue evolving, addressing the pressing demands of scientific inquiry.
As we move closer to realizing Aurora’s full potential, it is clear that this supercomputer is not just a milestone in technology but a pivotal tool for accelerating the pace of significant scientific breakthroughs.