Mali GPUs are used in about 85 percent of smart TVs and 50 percent of smartphones. Over billion Mali GPUs were shipped alone in 2019. The new Mali-G78 can help Arm achieve a PC and console looking graphics, better XR, and machine learning performance compared to the previous generations.
25 percent better performance
Comparing mixed complex workloads on Mali-G78 to Mali-G77, including architectural, process, and other improvements, the new GPU scores about 25 percent better performance. ARM compared the projected performance of Mali-G78 to existing 2019 devices. We had a long and pleasant chat with Stephen Barton, product manager, Client Line of Business, Daniel Kerry, principal engineer, Central Engineering, and Ian Hutchinson, director of outbound marketing, Client Line of Business at Arm.
When asked to elaborate where the 25 percent performance improvement comes from, we learned that some 15 percent comes from shrinking the manufacturing process from current 7 to soon to be introduced 5nm SoC. The rest comes from internal optimizations. The 5nm silicon will also bring the smaller die size.
Mali-G78 brings 15 percent more performance density, 10 percent better energy efficiency, and 15 percent machine learning performance uplift.
18 and 24 versions
It supports up to 24 cores allowing the highest-ever performance point. The game-changing' Asynchronous Top Level maximizes performance productivity on cores, ARM claimed. The new Fused Multiply-Add (FMA) unit is built from the ground up as it is heavily used in Graphics and ML processing and results with a 30 percent energy reduction in the unit.
The Mali-G78 is ARM's highest performing GPU based on Valhall architecture. Mali-G77 was also based on the Valhall architecture focusing on the Superscalar engine, Unified memory, and simplified scalar ISA. One big change from G77 to G78 is the core count, that's increased from maximal 16 to now 24.
Asynchronous top-level introduced two asynchronous clock domains, one for shared cores and one for job manager, tiler, MMU, control fabric, and L2 cache. Asynchronous top-level shaders can run two times faster than the rest of the GPU, allowing higher performance.
ARM showed that the 24 core version of the GPU scores thirteen percent more in benchmarks than the 18 core version. Once ARM uses the Asynchronous top level, the performance of the 18 core version increases by an additional eight percent and a whopping 23 percent over the nominal 18 core performance.
Gaming performance
In gaming applications, the 24 core version scores eleven percent faster than the 18 core, while the 18 core version with Asynchronous level scores 14 percent higher. Mali-G78 24 core version with Asynchronous scores 28 percent higher.
Like any mobile processor, Asynchronous Top level will increase the clock when needed, get the job done- and get the right frame rate, it returns to a standard clock for sustainable performance and energy consumption.
Average energy goes down by 10 percent in similar conditions when comparing G78 to last year's G77.
Energy Usage using Asynchronous Top Level can result in six to thirteen percent energy consumption reduction.
The highest benefits come in complex gaming scenes involving smoke, grass, and threes. Optimizing content can yield a five to seventeen percent performance increase in actual games over Mali-G77.
The ARM Performance advisor tool helps developers achieve higher performance on ARM hardware. Frame analysis lets developers easily understand bottlenecks. Last but not least, the new Mali-G78 GPU brings fifteen percent higher machine learning performance.
Machine Learning on GPU covers a variety of mobile use-cases, including security (e.g., face unlock), video and camera modes, gaming, and Augmented Reality (AR). Still, these workloads also run in collaboration with NPU too.
Asynchronous Top Level boosts ML performance through clocking shader cores. We can expect Mali-G78 based phones next year.