Intel formally launches Ponte Vecchio as Data Center GPU Max, server blades already shipping

The Intel ‘Ponte Vecchio’ GPU or the ‘Intel Data Center GPU Max Series’ as the company now likes to call it, is a major product which has 128 Xe Cores, 128 RT cores (making it the only HPC / AI GPU that has a native raytracing core), up to 64 MB of L1 Cache and up to 408 MB of L2 cache. 128GB of HBM2e has also been used and the IO will connect up to 8 discrete dies. PCIe Gen 5 is being used along with Xe Link to deliver a tremendous amount of processing power. It is created using a mix of Intel 7, TSMC N5 and TSMC N7 packaged through EMIB and Foveros approaches. Max Series GPUs will be available in several form factors to address different customer needs:

Max Series 1100 GPU: A 300-watt double-wide PCIe card with 56 Xe cores and 48GB of HBM2e memory. Multiple cards can be connected via Intel Xe Link bridges. Max Series 1350 GPU: A 450-watt OAM module with 112 Xe cores and 96GB of HBM. Max Series 1550 GPU: Intel’s maximum performance 600-watt OAM module with 128 Xe cores and 128GB of HBM.

Intel is saying the architecture will allow up to 8 OAMs to be connected for absolute beast mode performance and based on the numbers they gave for 4 OAMs we can calculate the following:

1 OAM: 128GB HBM2e, 128 Xe Cores, 600W TDP, 52TFLOPs, 3.2 TBs/ memory bandwidth 2 OAM: 256GB HBM2e, 256 Xe Cores, 1200W TDP, 104 TFLOPS, 6.4 TB/s memory bandwidth 4 OAM: 512GB HBM2e, 512 Xe Cores, 2400W TDP, 208 TFLOPS, 12.8 TB/s memory bandwidth

Now let’s talk about performance. Max Series GPUs deliver up to 128 Xe-HPC cores, the new foundational architecture targeted at the most demanding computing workloads. Additionally, the Max Series GPU features: Intel is claiming that each OAM is 2x fater than an NVIDIA 100 in OpenMC and miniBUDE. Intel states the Intel Data Center GPU Max Series has an aggregate 1.5x performance lead in ExaSMR - NekRS virtual nuclear reactor simulation workloads like AdvSub, FDM (FP32), AxHelm (FP32) and AxHelm (FP64). Finally, they are also claiming the performance crown (when compared with the NVIDIA A100) on financial workloads like Riskfuel which are used to train credit options pricing models. Intel also reiterated its intention to release the beastly successor to Ponte Vecchio, which will be Rialto Bridge. It will house up to 160 Xe cores in a new OAM v2 form factor. The biggest change to the GPU is in the die layout. While Ponte Vecchio has 16 Xe-HPC dies, each with 8 Xe cores for a total of 128 cores or 16,384 ALUs, the Rialto Bridge GPU comes with 8 16 Xe-HPC dies. So that should be 20 Xe cores per die for a total of 160 Xe cores on the 8 dies. That rounds up to 20,480 ALUs which is a 25 percent increase over its predecessor.   The full presentation can be seen below:

Intel Max Series Data Center GPU  128GB HBM2e  52 TFLOPs Per OAM  8 OAM Max  Up To 2x Faster Than NVIDIA A100 In Specific Workloads - 83Intel Max Series Data Center GPU  128GB HBM2e  52 TFLOPs Per OAM  8 OAM Max  Up To 2x Faster Than NVIDIA A100 In Specific Workloads - 87