Teraflops: What are they, what do they measure

FLOPS and their multiples, such as TeraFLOPS , have been used to measure the performance of processing units , such as the CPU or GPU. However, it is not the best option in all cases, since it can give a false impression, as is the case with graphics cards, where more is not always better…

Index of contents

  • What is a FLOP?
  • Multiples and TeraFLOPS
  • examples
  • Is it really a suitable measurement for graphics cards?
    • Time to think about TeraFLOPS
  • So? What is better to measure GPU performance?
  • Conclusion

What is a FLOP?

The FLOPS (Floating Point Operations Per Second) , is a unit of average performance, specifically measures the performance in floating point operations that can be processed every second by a machine, especially to measure the performance of processing units such as the CPU or the GPU, as I have previously commented.

You may also be interested in knowing which are the most powerful CPUs and GPUs on the market.

FLOPS can also appear as FLOP/s, although they rarely do. Either way, first you have to understand what floating point is , but if you want a quick definition, IBM does it very well:

A method of encoding real numbers within the limits of finite precision available in computers.

That is, computers can process fixed-point or integer data , which are numbers such as -1, +2, 9009, 10, 0, -20, etc., while it can also use real numbers in scientific notation (with a mantissa, an exponent and a base or root), which are floating point. As you will understand, these floating point data require more resources to process, and they usually take more clock cycles.

The floating point encoding for the exponent was defined in base two for machines like the mythical Cray or the Digital Equipment Corp. VAX. On the other hand, IBM used a base 16 or hexadecimal for its IBM Floating Point Architecture. However, to standardize this, the ANSI/IEEE STD 754-1985 standard was created which used base ten. Currently, the vast majority of processors use this standard for 32-bit (single precision) and 64-bit (double precision), beyond that (extended precision), or also 16-bit (medium precision), etc. Likewise, you can identify them as FP64, FP32, FP16, FP8, etc.

Floating point data is common when running scientific or multimedia software , such as graphics. This is why they are especially important in fields like HPC and also for graphics cards.

Currently, especially in certain fields where floating point workloads are essential, more relevance has been given to FLOPS, although MIPS also continues to be a performance measure for other cases. In both cases performance is measured, only in the case of MIPS it is more suitable for other workloads, such as databases, word processing, spreadsheets, etc.

MIPS stands for Millions Instructions Per Second or millions of instructions per second. But do not confuse IPS with IPC, since IPC is the number of instructions executed per clock cycle. On the other hand, it must be said that there is also the IOPS (Integer Operations Per Seconds) unit , which is the equivalent of FLOPS for integers, that is, integers both with and without sign.

Frank H. McMahon , while working at the National Lawrence Livermore Lab, came up with the idea of ​​using the magnitude of FLOPS so that he could compare the performance of various supercomputers. In this way, he was able to stop using MIPS which was more suitable for comparing PCs as you have been able to deduce from the previous paragraphs.

Multiples and TeraFLOPS

But in this article we are going to deal with TeraFLOPS. If you are wondering what they are, simply say that they are a multiple of FLOPS , since when McMahon devised this metric, machines at that time processed only a few floating point operations per second and no other multiples were necessary. Instead, far greater measures are now required for the enormous performance of today’s processing units or supercomputers.

Multiple prefix Abbreviation Magnitude order Full name
jig- G. 10 9 gigaFLOPS
(GFLOPS)
tera- you 10 12 teraFLOPS
(TFLOPS)
peta- P 10 15 petaFLOPS
(PFLOPS)
exa- AND 10 18 exaFLOPS
(EFLOPS)
zetta- z 10 21 zettaFLOPS
(ZFLOPS)
yotta- AND 10 24 YottaFLOPS
(YFLOPS)

Initially other smaller multiples were also used , such as MFLOPS or MegaFLOPS, but now it is absurd to use this unit for most current processing units.

examples

Throughout history, the performances achieved by some of the most popular CPUs and GPUs , as well as the most powerful machines in the world, have been recorded. And big conclusions can be drawn from this. For example, that CPUs have a much lower computing performance than a GPU, hence the preference to use the GPGPU as an accelerator for scientific loads or for AI.

For example:

  • Intel Core i5-9600K @ 3.7 Ghz was able to achieve a mark of 37.73 GFLOPS, that is, about 0.03773 TeraFLOPS.
  • Apple M1 can develop 154 ​​GFLOPS, or 0.154 TeraFLOPS.
  • An ARM processor in a Raspberry Pi 4B could reach 6.69 GFLOPS, or about 0.006 TeraFLOPS.
  • In 2008, one of the most powerful supercomputers in the world at that time was launched, topping the Top500 list. It was the IBM Roadrunner. It could reach 1 PetaFLOPS or 1000 TeraFLOPS. You may think that’s a lot, but just a single graphics card at the time, like the AMD Radeon HD 4800 was one of the first to achieve TeraFLOPS, and in August of that same year they released the AMD Radeon HD 4870X2 with two Radeon R770 GPUs. that reached 2.4 TeraFLOPS.

As we can see, graphics cards or GPUs have much higher potential than CPUs, as can be seen from these data.

Is it really a suitable measurement for graphics cards?

Now, one of the reasons why we are writing this article is because of the popularity that this unit is reaching, the TeraFLOPS , to measure the performance of the latest generation graphics cards for gaming as well. However, the question that should be asked is whether TeraFLOPS really matters as much as it seems for video games.

I already anticipate that the FLOPS are only positive for scientific applications or for HPC, where this is important for the workloads that are used. Furthermore, it is especially so when used as a GPGPU , for generic processing, as an accelerator. But not for gaming… Do you want a demo? Go for it.

Time to think about TeraFLOPS

For example, look at this table:

GPUs FP32 TeraFLOPS FP64 TeraFLOPS Ratio
GeForce RTX 3090 35,580 0.556 FP64 = 1/64 FP32
GeForce RTX 3080 29,770 0.465 FP64 = 1/64 FP32
Radeon RX 6900XT 23,040 1,440 FP64 = 1/16 FP32
Radeon RX 6800XT 20,740 1,296 FP64 = 1/16 FP32

As we can see, an AMD Radeon RX 6900 XT can reach 1.4 TeraFLOPS of floating point compute performance, while the NVIDIA GeForce RTX 3090 can reach approximately 0.5 TeraFLOPS when talking about double precision floating point (FP64). ). To give you an idea, the new GeForce RTX 4090 can reach 83 TeraFLOPS at FP32, that is, almost double the RTX 3090 and almost quadruple the Radeon RX 6900 XT. For FP64, we have that the RTX 4090 can reach 1.29 TeraFLOPS. On the other hand, to give you an idea, the AMD Radeon RX 7900 XTX can develop 61.42 TeraFLOPS at FP32 and 1.91 TeraFLOPS at FP64.

And now I ask you, looking at these figures, to make you reflect :

  • Is the GeForce RTX 3090 35% better than the Radeon RX 6900 XT as indicated by the data in FP32?
  • Is the GeForce RTX 3090 half as powerful as the Radeon RX 6800 XT as the FP54 data indicates?
  • Is the new RTX 4090 more than twice as powerful as an RTX 3090 as the floating point data indicates?
  • Is the Radeon RX 7900 XTX much more powerful than the RXT 4090 as indicated by the FP64 data? Or is the RTX 4090 vastly superior to the RX 7900 XTX as the FP32 data indicates?

Well, the truth is that the answer to all these questions is negative . No, TeraFLOPS data does not determine performance achieved in gaming tests. They are only an appropriate measure to determine the floating point calculation performance of the GPUs, and it will depend to a large extent on the number of FPUs and their architecture.

So? What is better to measure GPU performance?

Finally, let’s look at three other examples that will help us answer this other question about what to use to measure gaming performance better than TeraFLOPS . We are going to take three current models:

TeraFLOPS FP32 TeraFLOPS FP64 Pixel Rate() texture rate
NVIDIA GeForce RTX 4090 82.58 1.29 443.5 1290
AMD Radeon RX 7900 XTX 61.42 1.91 479.8 959.6
Intel Arc A770 19.66 1 307.2 614.4

If we look at the performance tests carried out in video games, the performance order of these graphics cards would be as follows:

  1. NVIDIA GeForce RTX 4090
  2. AMD Radeon RX 7900 XTX
  3. Intel Arc A770

At first it might seem that they are ordered according to FP32 performance, but the huge differences between one and the other does not reflect the differences in performance in games. Also, if we go to the FP64 column, AMD should be the most powerful of the three, and this is not reflected in gaming either.

And what is pixel rate and texture rate ? Well, as their own names indicate, it is the pixel rate and the texture rate that a GPU can create respectively. Are these better units for measuring performance? Perhaps they are better at determining graphics performance by generating pixels and textures.

  • On the one hand, the pixel rate is measured in GPixel/s, that is, 1,000,000,000 pixels per second that the GPU can generate.
  • As for the texture rate, it is measured in GTexel/s, that is, the number of 1,000,000,000 textures that it can generate per second.

If we look at these images, there would only be one piece of data that would be out of place, that of the GPixel/s of the AMD Radeon, which is in second position in the ranking and yet would be the one that generates pixels the fastest. The rest of the values ​​would be really ordered according to the performance that these graphics really have in video games. That is, they would be somewhat more reliable for measuring and comparing performance .

Conclusion

In short, TeraFLOPS can be a good average for workloads in HPC or for GPGPU. However, if we want to measure gaming performance, it is a unit that is not at all reliable. Perhaps some manufacturers use these metrics, but don’t be fooled, they do it purely for marketing if they see that they stand out from the competition…