Depth limit: Oceananigans achieve standard climate simulation

Authors:
(1) Simon Silvestry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
(2) Gregory Vagner, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
(3) Christopher Hill, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
(4) Matin Rayai Ardakani, North Eston University, Boston, Massachusetts, USA;
(5) Johannes Blaschki, Lawrence Berkeley National Laboratory, Berkeley, California, USA;
(6) Valentin Corpical, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
(7) Jean -Michel Campin, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
(8) Navid Constantinou, Australian National University, Canberra, ACT, Australia;
(9) Alan Edelman, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
(10) John Marshall, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
(11) Ali Ramadan, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
(12) Andre Souza, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
(13) Rafael Ferrari, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
Links table
Abstract and 1 justified
2 Performance features
3 Overview of the problem
4 The current status of art
5 innovations
5.1 Starting from scratch with Julia
5.2 new numerical methods for liquid dynamics limited in the field
5.3 Improving ocean -free surface dynamics for the expansion of unprecedented graphics processing unit
6 How was the performance measured
7 performance results and 7.1 scaling results
7.2 Energy efficiency
8 effects
9 thanks, appreciation and references
6 How was the performance measured
The performance of the OCEANANGANS model is estimated to simulate ocean near Al -Jalibiya with various areas: a realistic field (R) and the field of Aqua planet (AP). Both fields extend to the full longitudinal range of the ball and cover the range of length lines from 75 degrees to 75 degrees.
The realistic field contains a realistic bathroom measurement and is forced on the realistic surface momentum, heat and salinity derived from the ECCO2 status estimate[31] In three decisions:
• Oceanangansr12 With horizontal accuracy 1/20 (∼7 km) and 48 head levels
• Oceanangansr24 With horizontal accuracy 1/24 (.33.4 km) and 100 vertical levels
• Oceanangansr48 With horizontal accuracy 1/48 (∼1.7 km) and 100 headers
Figure 3 shows the surface vertical vortex after the integration of one year Oceanangansr12 Oceaniagansr48 is on the global environment and also for the selected areas to show more details. both of them Oceanangansr48 and Oceanangansr12 The upper turbulent ocean features exhibition that is not currently resolved by most of the IPCC category models. the Oceanangansr48 The solution shows the fronts, threads and other features of “Subsoscale”, which achieve only a few times in global simulation operations.
Idealism OceananGansAp A group of simulation [13]Which contains the perfect depth measurement and the power of the surface that does not require the fulfillment of various decisions, is used for weak scaling experiences. everyone OceananGansAp Experiments have 100 vertical levels and three lines divide the global ocean into two basins. We disagree the horizontal decision OceananGansAp From 1/6 degree (-14 km) to 1/66 degrees of degree (∼488 m).
None of our simulations require the explicit horizontal spread of momentum or tracking due to the adaptive Weno adhesion scheme shown in Section 5.2. All simulation operations are used as a parameter based on Richardson-Number for vertical mixing due to the unmuntent storytelling and thermal pregnancy disorder with standards of 1 to 100 meters.
To evaluate time time for every experience in the daily simulation years (SYPD), we measure the average wall clock time for each time step. Over time samples are taken through the NSIGHT system in NVIDIA and recorded by NVIDIA Tool Extension Library via the NVTX.JL Julia package.
To assess the efficiency of each solution in the simulation years per hour SYPMWH, we integrate SYPD and estimate the average energy withdrawal during the duration of the experiment. On Mit Satori [2],, Which has 256 Nvidia v100s, we have access to accurate measurement to measure bills. For all simulation operations with NVIDIA A100s, we appreciate power consumption P with
Where D is the number of A100s and N is the number of nodes.
We also note that energy estimates are provided by Licom3 and Cosmo, but not for LLC4320 or Veros. To estimate the power consumption in LLC4320, we assume that each of the 1000 CPU contracts withdraw 500W. We appreciate the energy consumption for iHESP CESM [51] And haadgcm3 [37] As a percentage of the peak energy consumption of their groups. We use the equation (1) to estimate the energy consumption in Veros on one knot with 16 A100s.
7 performance results
We report both the scaling results by time to the solution in SYPD and energy efficiency results to the solution in SYPMWH.
7.1 scaling results
Realistic simulation of oceans (Satori and attractive groups). We report strong scaling tests using the realistic world preparation shown in Figure 3 in two groups: (1) MIT Satori collection [2]HD 9 Performance System 9 consists of 64 energy contracts hosts four NVIDIA V100 graphics processing units with 32GBS memory each, and (2) attractive MIT set, using 8 contracts host 4 NVlink -00s with 80 GB memory each. The resulting wall clock time is displayed in a time step, with an average of more than 1500 times, in Figure 4 for each individual accuracy account (FP32) and double accuracy accounts (FP64). On one knot, Oceanangansr12 It achieves 0.9 SYPD with double resolution and 1.4 SYPD with one resolution, with a wall clock in a time step ranging from 330 to 550 millimeters. When the number of nodes increases to 16 (64 graphics processing units), the general expenses of the connection increases, which leads to 12.4 SYPD with one resolution and 7.75 SYPD with double resolution. We measure a powerful standard efficiency of 52 % with one resolution and 55 % dual accuracy on 64 graphics processing units, because the arithmetic work burden (40 mm of the wall clock per time step) becomes very short to completely hide public communications.
For the simulation of the weather absorption in the high -resolution ocean, the scaling is almost perfect across the range that we check. to Oceanangansr24 (FP64-V100) and Oceanangansr48 (FP32-V100), we measure the larger scaling of the ideal. This non -intuitive result is a product to improve pregnancy balance while increasing the number of graphics processing units. In short, we get 1.94 SYPD on graphics processing units 120 V100 with a kilometer accurate (Oceanangansr24) And 0.33 SYPD with simulation of the weather solution in the ocean (Oceanangansr48). Finally, we have tested Oceanangansr48 Setting on 144 Perlmuter (576 A100 graphics processing units), to 0.95 SYPD. This is the first case of the ocean by one kilometer to achieve SYPD 1 SYPD. We have also tested Oceananigansr12 setting on 17 contracts that get 9.9 SYPD (see Figure 5).
Aqua-Planet simulation (Perlmter). We report the vulnerable measurement tests on the SuperCUTER NERSC (Perlmter). Perlmturt is Hpe (Hewlett Packard Enverrise) Cray Ex Super
Computer that hosts four A100 graphics processing units with 40 GB per knit, linked through NVLINK3 connection. All weak measurement tests are performed using OceananGansAp Preparation on double resolution. We allocate two different horizontal decisions (1/12 and 1/6 degree), and gradually increase them with the number of graphics processing units while maintaining 100 header levels. As shown in Figure 5, we get 100 % weak measurement efficiency for the entire scope of the investigation (from 1 to 196 contract – from 4 to 768 A100).