OceanLite is an exascale supercomputer located in Wuxi, China built on the Sunway SW26010P (or “SW26010-Pro”1) processors.
System overview
OceanLite has 107,520,2 107,250,3 or 107,1364 compute nodes, each with a single SW26010P CPU and a tapered fat tree interconnect.
According to the system specification, it should have .4 It was used in a Gordon Bell Prize-winning paper2 that achieved 1.2 EF FP32 performance.
It is unclear how the nodes and racks are laid out.
Node architecture
Each compute node has:
- 1x SW26010P CPU
- 2.25 GHz4
- 6x core groups (CGs)
- 1x memory controller each
- 16 GB DDR4 at 51.2 GB/s
- 1x management processing element (MPE)
- 1x 8x8 computing processing element (CPE) cluster (64 cores), each with a 512-bit vector unit
- 390 cores (“processing elements”)
- 6 core groups 64 PEs per core group = 384 cores
- 6 core groups 1 management PE = 6 cores
- 384 + 6 = 390 cores total
- 96 GB DDR4 at 307.2 GB/s
- 13.8 TF4 or 14.03 TF1 FP64
- 27.6 TF4 or 14.03 TF1 FP32
- 55.30 TF FP16 and BF161
Network architecture
Groups of 256 compute nodes are combined into supernodes that are interconnected as a non-blocking fat tree. These supernodes are connected in a second level fat tree with 16:3 oversubscription.1
In addition, OceanLite has a second network dedicated to I/O traffic.1
Footnotes
-
BaGuaLu | Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
[2110.14502] Closing the “Quantum Supremacy” Gap: Achieving Real-Time Simulation of a Random Quantum Circuit Using a New Sunway Supercomputer ↩ ↩2
-
How China Made An Exascale Supercomputer Out Of Old 14 Nanometer Tech ↩
-
China’s New(ish) SW26010-Pro Supercomputer at SC23 ↩ ↩2 ↩3 ↩4 ↩5