Aimed at machine learning and AI, the Intel Xe HP GPU, which was previously seen in its smallest, 1-tile configuration back at Intel Architecture Day, now came to HotChips, but in its largest version, 4-tile configuration.
In case you missed it earlier, the Xe HP GPU will be available in 1-tile, 2-tile, and 4-tile configurations, and while we do not have any precise details, Intel did show a scaling demo on how Xe HP GPU scales with peak FP32 hitting 10588 GFLOPs with 1-tile, 21161 GFLOPs in 2-tile, and 41908 GFLOPs in 4-tile configuration, describing it as a chip that brings "the most FP32 performance in a single package".
Unfortunately, we are still missing key architecture and packaging details on how those tiles are connected.
The Xe HP GPU has some impressive scaling, and Tomshardware.com got into some number crunching, suggesting that with 2048 EUs capable of 128 operations per cycle and 2 FMA units, the performance goes up to 524,288 FLOPs, which also means that the GPU has to have a 2GHz clock in order to hit that PetaFLOP performance, or gives out more than 128 TOPs of compute performance per EU/tensor core.
In any case, we will surely hear more about Intel's Xe GPUs, which will be launching as Xe LP (Low Power), as a part of the upcoming Tiger Lake, as well as DG1 and SG1 cards; Xe HP for data centers; Xe HPC for servers, and as Xe HPG, a high performance/gaming optimized GPU coming in 2021.