Published in Reviews

Nvidia GTX 980 reviewed

by on19 September 2014

Index

KeyVisual 0 top-value-2008-lr

Review: Pinnacle of efficiency

Nvidia has released two new graphics cards based on its latest Maxwell GPU architecture. The Geforce GTX 970 and Geforce GTX 980 will replace the outgoing Kepler-based Geforce GTX 780 and Geforce GTX 780Ti.

Nvidia designed Maxwell to deliver a twofold increase in performance-per-watt. This is very good news since it indicates a big improvement in efficiency and performance. On a side note, Maxwell is Nvidia’s tenth GPU generation, so some celebrations are in order.

0-GTX 980

With the GTX 980 and GTX 970, Nvidia also decided to launch a few new technologies there were not employed in the first incarnation of Maxwell on the GTX 750 series. Voxel Global Illumination, Multi Frame Anti-aliasing and Dynamic Super Resolution are the new technologies in question. They are built around old concepts, but they do a good job at reinventing them and we will take a closer look at each one of them later in our review.

Of course, the GTX 980 and GTX 970 are not the first cards based on the Maxwell architecture. Earlier this year Nvidia introduced the Geforce GTX 750 Ti, an entry level card with good performance and extremely low power consumption. At just 55W, the GTX 750 Ti is capable of delivering 1080p gaming. Several partners decided to launch low-profile versions of GTX 750 Ti, making it a very potent graphics card for small form factor rigs.

The GTX 980 and GTX 970 need quite a bit more power, but they are still vastly more efficient than their Kepler-based predecessors. The GTX 980 has a TDP of 165W, while the GTX 970 is rated at 145W. The old GTX 780 Ti and GTX 780 have a TDP of 250W.

What’s more, the GTX 980 and GTX 970 should be on a par with the GTX 780 Ti and GTX 780 respectively. Both cards are capable of delivering smooth frame rates at resolutions up to 2560x1600. In case you are planning to build a 4K capable system, two cards should be enough for good frame rates and anti-aliasing in demanding titles.

Both cards are based on the same GM204 GPU, manufactured on TSMC’s venerable 28nm process. The GPU packs 5.2 billion transistors and has a die size of 398mm2. The GTX 980 comes with 2048 CUDA cores, while the GTX 970 has 1664 CUDA cores enabled. 

The base GPU clock of the reference GTX 980 graphics card is set at 1126MHz while the Boost clock is set at 1216MHz. The reference GTX 980 features 4GB of GDDR5 memory clocked at 1750MHz (7000MHz effective), providing up to 224GB/sec of peak memory bandwidth.

The GTX 980 cooler resembles the GTX 780 series cooler facing from the front, but once the card is turned it is easy to spot the difference – Nvidia decided to place a new backplate and I/O panel. The card sports a total of five display outputs, DVI-I, HDMI 2.0 and three DisplayPort 1.2 ports.

1-GTX 980

An important new feature (at least for gamers who want to show off on social networks) launched with GTX 780 cards in the form of ShadowPlay. This is an automatic gameplay recorder that uses GPU encoding and it ships with Geforce Experience 1.7 and higher. On the GTX 980, ShadowPlay supports 4K/UHD resolutions.

Shield PC streaming as well as G-Sync are supported as well. These features should help add a bit more appeal to GK110/GM204 based graphics cards. G-Sync can synchronize GeForce Kepler/Maxwell cards and your G-Sync capable monitors. This technology will eliminate frame stuttering and tearing due to a lack of synchronization between your graphics card and monitor. The Geforce GTX 980 also features the same GPU Boost 2.0 technology used in the Geforce GTX Titan, with more advanced controls for auto-overclocking, fan control, and hardware monitoring.

Of course, DirectX 12 is supported on all Maxwell cards.


The GeForce GTX 980 is based on the new GM204 GPU, based on Nvidia’s latest Maxwell architecture. The GPU is split into four Graphics Processing Clusters (GPCs) and each GPC has four Streaming Multiprocessor Maxwell (SMM) units, which adds up to 2048 CUDA cores. The GTX 970 is based on the same GM204 GPU, but has fewer units enabled, i.e. 13 SMMs for a total of 1664 CUDA cores. The rest of the specification includes 2MB of L2 cache, 64 ROPs, 128 TMUs and a 256-bit memory interface.

GeForce GTX 980 Block Diagram FINAL-small

Maxwell GPUs have been optimized to consume as little power as possible and deliver a significant efficiency improvement over Kepler products.

The Maxwell architecture was spawned through a number of advances developed for the Tegra K1. Efficiency is crucial in SoC GPUs and mobile parts. Sales data shows that the gaming laptop market is growing at a staggering pace and it has been doing so for more than five years. Power efficient GPUs have a huge market beyond desktop gaming and Maxwell will eventually make its mark on the mobile segment as well.

Before we move on to more details let’s refresh our memory.

Nvidia launched the GK110 in May 2013 as the Geforce GTX 780 that came with 2304 CUDA cores, 863MHz base GPU clock, 192 texture mapping units (TMUs), 48 render output units (ROPs) and 3GB of GDDR5 memory running on a 384-bit memory bus. The GK110 is split into five GPCs and for the GTX 780 Nvidia enabled 12 out of a total of 15 available SMXs (192 CUDA cores per SMX). Six ROP partitions handle up to eight pixels per clock, adding up to 48 ROP units. Specification wise the GTX 780 ended up with 50% more CUDA cores than its predecessor, the GTX 680, and this among some other things gives the GTX 780 card a significant performance boost, although the GK110 is based on the Kepler architecture used in the GTX 680. The GK104 chip used in the GTX 680 has 1536 CUDA cores. The GTX 780 has a 384-bit bus while the GTX 680 is limited to a 256-bit interface (2048MB memory), but the memory speed on both cards is 6008MHz.

Nvidia launched the GK110B in November 2013 as the Geforce GTX 780 Ti that came with 2880 CUDA cores (all SMXs are enabled resulting in 25 percent more cores than the original GTX 780). The GPU base clock was set at 876MHz. The rest of specification includes 240 texture mapping units (TMUs), 48 render output units (ROPs) and 3GB of GDDR5 memory running on a 384-bit memory bus. The GTX 780 Ti card comes with faster memory (7000MHz effective), compared to the 6008MHz on the GTX 780, resulting in 336GB/s and 288GB/s memory bandwidth respectively.

Now back to GTX 980. The GTX 980 has a 256-bit memory interface (224GB/sec bandwidth) compared to the GTX 780 Ti with 384-bit memory interface (336GB/s bandwidth), but it comes with more memory, 4GB versus 3GB memory on the GTX 780 Ti. To improve the effectiveness of the GPU’s memory bandwidth, the GM204 features a new compression engine which can further reduce DRAM bandwidth demands.

To improve performance in high resolution/high AA workloads, the number of ROPs increased to 64, up from 48 ROPs on the GTX 780 Ti. Combined with the GTX 980’s higher clock speeds (Base GPU clock is 1126MHz), the GTX 980’s pixel fill rate is 72 Gpixels/sec compared to 32.2Gpixels/sec in GeForce GTX 680 or 42Gpixels/sec in GeForce GTX 780 Ti.

Nvidia has rebalanced the design of the Maxwell SM (Streaming Multiprocessor) so that its CUDA Cores are fully utilized more often. This saves power and improves performance. The L2 cache is 2MB – four times larger than on the GK104. By incorporating more cache, fewer requests to GPU DRAM are needed; this improves performance and reduces the GPU’s power consumption.

GeForce GTX 980 SM Diagram FINAL-small

 


Nvidia’s two first design goals for the GM204 were to deliver extraordinary gaming performance for the latest high-resolution displays and to provide improved energy efficiency. The third goal was to deliver dramatic leap forward in lightning with Voxel Global Illumination (VXGI).

Since the early days of computer generated graphics, engineers and designers did their best to develop technologies and techniques that would eventually deliver photorealism. Lighting was always a part of the equations.  A direct solution to the rendering equation proposed by Kajia, 1986 is the path tracing. This includes following a huge number of rays as they bounce around the scene, but unfortunately the equation is not simple and even the most powerful PCs cannot compute it in real time at an adequate frame rate.

VXGI-1

 VXGI-2

 VXGI-3

Today’s game engines use creative combination of different techniques. Use of pre-computed lighting is a very common technique, but it can only be applied to static objects. VXGI can be accelerated natively by the GPU and it allows real-time calculation of approximated dynamic global illumination as well as ambient occlusion. While VXGI’s software algorithm will run on all GPUs, the performance benefits of VXGI hardware acceleration will only be available on Maxwell GPUs.

VXGI will become interesting for gamers only if it is able to compute dynamic global illumination at playable frame rates on all Maxwell cards rather than just high-end cards. According to Nvidia, VXGI works great on the GTX 980 and this card can render dynamic global illumination at frame rates that have never been possible. VXGI is accessible for developers using UE4 and Q4 engines, but other major engines should be supported too.

VXGI is based on the concept of using a 3D structure (voxels) to capture coverage and lightning information at every point of the scene. This data structure can then be traced during the final rendering stage to accurately determine the effect of light bouncing around in the scene.

VXGI incorporates three main steps: render geometry, evaluate light, and gather indirect light.

VXGI-4

At first the geometry (polygons) are translated into voxels. Each voxel is one element in a 3D grid. Voxelized objects are created of little cubes arranged in a grid.  

VXGI-5

Since objects are constructed of primitives, the set of voxels intersected by each primitive is determined, and the voxel opacity is computed (the amount of opaque material contained in the voxel). In the second step, direct light reflected by the voxels is calculated and in the third step bounced light from the environment is collected.

VXGI-6

The original VXGI concept was developed in 2011 by Nvidia researcher Cyril Crassin. The concept relied on voxels that were stored in an octree structure, but in order get native GPU acceleration a new structure was implemented as well as improved algorithm. In his original paper Crassin said he used a ‘voxel cone’ as an approximation of the effect of secondary rays. He stated that when it comes to rendering glossy reflections, just one ‘voxel cone’ is used per voxel, while for rendering diffuse materials (color bleeding effects) only a few scattered cones are needed, resulting in very realistic results at a much lower computational cost.

VXGI-7

The original demo delivered good results and we are convinced that the full implementation can only be better. Nvidia will demo new technologies including the latest developments in VXGI at the GAME24 event. http://game24.nvidia.com/info.

We can only show you a couple of pictures provided by Nvidia and which are actual screenshots from Apollo 11 demo, based on UE4 engine.

VXGI-Apolo-11-tech-demo-1

VXGI-Apolo-11-tech-demo-2

VXGI-Apolo-2-tech-demo-3

VXGI-Apolo-11-tech-demo-5


VXGI-Apolo-11-tech-demo-4

 


Nvidia also introduced a new anti-aliasing mode named Multi-Frame Sampled AA (MFAA), which combines multiple AA sample positions to produce a result that looks like higher quality anti-aliasing but with better performance (for example an image that looks similar to 4xMSAA at the compute cost of roughly 2xMSAA).

The idea is to sample two frames (frame n and frame n-1) with half as many sample position as required by 4xMSAA. The result is then combined into a single image, hence the name – mutli-frame.

These three operations, two sampling actions and the composition into a single frame, are less demanding than sampling a single frame using 4xMSAA. Unfortunately we could not test the new technology because it is not implemented in Nvidia’s current driver.

Nvidia is working hard to make it available in an upcoming driver release. The next image set explains the procedure. As you can see from the resulting image, the quality of MFAA is approximately the same as with 4xMSAA.

The samples are from Nvidia and they show an Apollo 11 demo based on the UE4 engine.

MFAA-1

 MFAA-22

 MFAA-3

MFAA-4

 MFAA-5


Dynamic Super Resolution (DSR) provides an easy way for gamers to improve the image quality in less demanding games. DSR is bringing the crispness of higher resolutions to gamers’ existing displays. We like the idea since a lot of people enjoy undemanding games and DSR can give them a splash of eye candy.

The basic idea revolves around using unharnessed GPU power to render the scene at a higher resolution, then scaling the image down to the native resolution. While it may sound like a pointless waste, a similar approach has been used in professional off-line graphics for years.

For example, a gamer with an 1080p display could use DSR to have the GPU render an image at a substantially higher resolution, which is then scaled back to 1080p, while still retaining more detail than a standard 1080p rendering. It is not 4K resolution quality, but it simply adds a bit more detail to lower resolution games.

DRS-1

DRS-2

The differences are most evident where you would expect them to be – on high contrast areas and thin objects such as hair, grass, wires and foliage. Basically stuff that looks terrible without antialiasing.

The next image shows the difference on a 1080p scene with and without DSR.

DRS-3

Dynamic Super Resolution is similar to the downsampling, which is used by enthusiasts today. However, many games were not designed with downsampling in mind, leading to artifacts in the final rendered image.

To address this issue, DSR uses a 13-tap Gaussian filter during the conversion to display resolution. This high-quality filter reduces or eliminates the aliasing artifacts experienced with downsampling, which relies on a simpler box filter (which is also computationally less demanding).

GeForce Experience uses the default 33% smoothness level for the Gaussian filter. Alternatively, you can use the DSR Smoothness setting in the driver’s control panel to adjust the smoothness level. Simply open the control panel and navigate to Manage 3D Settings and choose DSR Factors to select your screen resolution (settings ranging from 1.2x-4.0x native resolution in the current implementation) and use the DSR Smoothness setting to adjust the Gaussian filter, then launch your favorite game. You should be able to find the new DSR resolution inside the graphics menu settings of the newly launched game.

Another issue with conventional downsampling is ease of use. Because it isn’t a native solution, downsampling requires users to trick the GPU into thinking it’s connected to a higher resolution display than it actually is. In order to do this, users have to create custom screen resolutions and often have to adjust various low-level display parameters in order for downsampling to work. 

In comparison, Dynamic Super Resolution can be found inside GeForce Experience, where Nvidia provided convenient and easy to use controls for DSR with optimized game settings. The user simply needs to click the “Optimize” button and that’s it.

DRS-5

DSR can be used with GTX 980 / 970 cards but, you need a screen capture card or FCAT in order to make screenshots with DSR enabled. Otherwise you will simply get higher resolution screenshots, which do not illustrate what you are seeing on your lower resolution screen screen.


In order to keep the temperatures at bay, Nvidia decided to use an existing cooler, which’s isn’t a bad thing. We already had a chance to get acquainted with the Nvidia Kepler reference cooler and we found it more than adequate. We last saw it the GTX 780 Ti, where it served the card very well indeed. The cooler was designed to provide superior cooling performance and generate very little noise in the process.

 2-GTX 980
The cooler sports a silver aluminum casing for the cover and a polycarbonate window for show-offs. Through the window you see the heatsink. The card is 26.5cm long and it is 11cm tall and obviously it is a dual-slot design. On the upper side we see the GeForce GTX logo with LED backlighting, which was also used on the GTX 690, GTX 780, GTX 780 Ti and Titan.

3-GTX 980

A 6+6 pin power connection setup is used on the GTX 980 and GTX 970. The GTX 780 Ti and GTX 780 require a combination of 6+8 pin power connectors. The GTX 780, GTX 780 Ti and Titan have a TDP of 250W, compared to GTX 980 with a 165W TDP and GTX 970 with a 145W TDP.

The fan management is very good, so the 75mm fan seems rather docile and won’t surprise you with sudden rpm changes. Nvidia’s Boost technology takes care of fan management quite well. The GPU temperature threshold is set at 80 degrees Celsius and the card overclocks automatically until the temperature reaches the threshold, provided the load and power consumption readings are not already maxed out.

During auto-overclocking the temperature could be kept within the 80C envelope by accelerating the fan, but Nvidia wanted to keep things quiet, so Boost 2.0 tends to reduce the GPU Boost clock instead of speeding up the fan. Users who place an emphasis on low noise should like this approach, but those who want more performance can always play around with the settings.

The secret to good cooling is a good vapor chamber design, which also allows engineers to come up with compact yet powerful heatsinks that don’t too many expensive and bulky heaptipes.

5-GTX 980

Three heatpipes are embedded in the base of the heatsink and they transfer heat from the GPU to the fins. Nvidia uses high purity water in the heatpipes to improve their effectiveness.
6-GTX 980

7-GTX 980

 There are no memory modules at the back of the PCB. The GTX 980 has 4GB of memory.
8-GTX 980
GeForce GTX 980’s backplate is a stylish addition that provides additional protection for the board as well as additional cooling for the back of the PCB and a few components.

9-GTX 980
10-GTX 980

New to the GeForce GTX 980 reference board design is a partially removable backplate. This is necessary for gamers with 3-Way SLI configurations (or users who have their GeForce GTX 980 cards running directly side-by-side without an open slot between them) in order to improve airflow.

11-GTX 980

While it may seem like a small area, Nvidia’s engineers spent a considerable amount of time studying airflow between boards and determined that this region is critical for feeding air directly into the adjacent fan. Removing this segment of the backplate significantly improves airflow between boards when the cards are directly adjacent to each other, and GeForce GTX 980 users can still enjoy the aesthetic benefits of the backplate.

The GTX 980 is built for multi-GPU action. In addition to standard dual-SLI, it can also be used in triple- and quad-SLI setups, as it features two SLI connectors. The fact that this is a dual-slot design also helps.

As far as video outputs go, few users will have a gripe, since the card features three DisplayPort connectors, an HDMI 2.0 connector (allowing you to run 4K@60Hz), and a dual-link DVI output for a total of five connectors. Up to four can be used at any time.

Three G-SYNC displays can be driven from one GeForce GTX 980 card for instance. Here it’s important to note that when connecting multiple displays between more than one card, you may see performance differences (similar to previous GPUs). For example, a 3-Way SLI configuration with one display connected to each card, versus plugging all three connectors into the same card.

4-GTX 980

The I/O bracket also doubles as an exhaust vent, helping reduce temperatures within the chassis.



1-GTX 980

Testbed:
- Motherboard: Intel DZ87KLT-75K
- CPU: Intel Core i7 4770K, 4x3.5GHz (Haswell)
- CPU Cooler: EVGA
- Memory: 2x4GB Corsair DDR3 2400MHz  
- Harddisk:   Corsair Neutron GTX 240GB
- Case: CoolerMaster Cosmos II
- Operating System: Win8.1 64-bit

344.07_geforce_win8_winvista_win7_64bit_international.exe

14-4-win7-win8-win8.1-64-dd-ccc-whql.exe

AMD-Catalyst-14.30.1005-Radeon-R9-285-Windows-Aug29


res mark 1

res mark 2




res dogs 1

res dogs 2


res crysis 3 1

res crysis 3 2


res thief 1

res thief 2


res bat 1

res bat 2


res bio 1

res bio 2


res tomb 1

res tomb 2


res far 1

res far 3 1


res heaven 1

res heaven 2



The reference GTX 980 and GTX 970 use the much lauded Titan cooler, which does a very good job at keeping temperatures below the 82 Celsius mark, while remaining reasonably quiet during gaming. The fan speed went fluctuated from 1140RPM at idle to 2058RPM when gaming. Nvidia cooler is the ideal solution for reference clocked GTX 980 cards and it is pretty hard to improve upon. It will definitely be interesting to see what have Nvidia AIB partners prepared for the custom and factory-overclocked GTX 980 graphics cards.

As you can see, the Boost clock went up to 1240MHz (Base clock is 1126MHz)

gtx 980 temp in battlefield

The default temperature limit on the GTX 980 card is set at 80 degrees Celsius. The cooler does a pretty good job and the temperature never exceeds 81 degrees Celsius. This basically means that the card will reduce the Boost clock once it hits the threshold, staying within the thermal envelope without revving up the fan.


The GTX 980 does not bring a huge performance boost compared to the previous generation, but it proves that Maxwell is more efficient than Kepler. The GTX 980 has a TDP of just 165W. For comparison, the mid-range R9 270X has a TDP of 180W, while AMD’s latest R9 285 needs 190W of power. The old Tahiti-based R9 280 has a TDP of 200W, while the 280X churns out 250W, just like the GTX 780 Ti.

 res power


The GeForce GTX 980 comes with 2048 CUDA cores and 4GB of GDDR5 memory, and it is based on new Maxwell GM204 GPU. The GeForce GTX 780 Ti comes with 2880 CUDA cores and 3GB of GDDR5 memory, and it is based on the Kepler GK110B GPU.

Although their performance is similar and they are in the same market segment, the cards are very different. The GTX 980 is much more efficient, with a TDP of 165W, while the GTX 780 series had a TDP of 250W. This is good news on more than one level.

As far as performance goes, the GTX 980 does not pull ahead of the GTX 780 Ti by any significant margin. While it wins in some tests, in others it loses to the old GTX 780 Ti.

Nvidia’s drivers are not ready yet and it is currently impossible to try out the new multi-frame antia-aliasing (MFAA) feature. Once the drivers are optimized, it is very likely that Kepler cards will lose to new Maxwell products.

Gaming at resolutions up to 2560x1600 is not a problem for the GTX 980 and it can deal with practically any title out there. As far as 4K and UHD gaming go, you will need two of these babies to crank up the eye candy in such high resolutions.

The GTX 980 employs a cooler that is capable of handling the much hotter GTX 780 Ti and GTX Titan cards, hence it has no problem dealing with the vastly more efficient GTX 980. The cooler is silent in idle mode and it is very quiet under load.

Nvidia launched three new technologies along with the GTX 980. Voxel Global Illumination (VXGI) offers an efficient way of computing dynamic global illumination in real time, while MFAA can in theory deliver effects similar to 4xMSAA at a much lower computational cost. Unfortunately we could not try it out this time around.

Dynamic Super Resolution (DSR) provides an easy way for gamers to improve the image quality in less demanding or older games. We saw DSR in a demo and it is up to developers to harness this power in games. DSR is universal and can be applie to any game by activating it in the driver.

With all the figures in mind, it is evident that the GM204 is the most efficient high end GPU we had a chance to test so far.

In terms of pricing, it should be noted that the GTX 780 debuted at $649, while the GTX 780 Ti launched at $699. The GTX 980 launches with a suggested retail price of $549, while the GTX 970 is priced at $329.

Since the GTX 980 and GTX 970 don’t deliver much of a performance gain, users of 780-series cards have no reason to upgrade. However, those coming from an older card do. What’s more, thanks to their efficiency, Nvidia’s new 900-series cards could even be used to upgrade gaming rigs originally based on mid-range components, with relatively weak PSUs. Our tests prove that you don’t need a massive PSU to build a high-end gaming rig with Maxwell graphics and this saves money both during the build and in the long run.

fudz topvalue ny

Last modified on 19 September 2014
Rate this item
(1 Vote)

Read more about: