AVX-512: advantages and disadvantages

ISA instruction extensions are common on many architectures. For the latest and greatest microprocessors in the x86 family, we have many different extensions, such as the 512-bit AVX-512 for data length. That is, instead of operating in 64-bit like the rest of the parts of the CPU, for these instructions a series of registers are grouped and treated in a special FPU.

Thanks to this, several operations can be carried out in a single sitting, with a single instruction, instead of having to operate in a scalar way, that is, data by data. With the AVX extensions you work with data vectors to which the same operation is applied to all of them. That is, it could be done on 8 64-bit data at the same time, or on 16 32-bit data, etc. However, although they can speed up many workloads, such as scientific ones, not all are advantages in these AVX…

You may also be interested in:

  • CPU microarchitecture: what is it

What is AVX-512?

In addition to the main ISA itself, with the basic instructions of the AMD64 or EM64T or x86-64 architecture, whatever you want to call it, there are also many other extensions , that is, additional sets or sets of instructions that are added to complete the ISA and speed up certain workloads, for example, TensorFlow libraries can take advantage of them. Among them we have the AVX-512 instruction set .

It is the second iteration of AVX or AVX2. This instruction set made it to Intel processors in 2013. And it stands for Advanced Vector Extensions . This repertoire would be incorporated for the first time in the Intel Xeon Phi (Knights Landing), and later it would also pass to the servers with the Intel Xeon (Skylake-X).

But Intel made a somewhat compromised and criticized decision, and that is that the AVX-512 instruction set would also reach consumers, that is, customers. This was on the Intel Cannon Lake microarchitecture . Later it would also be inherited by Ice Lake, Tiger Lake, etc. Something incomprehensible to many, since this implies higher consumption, larger size of the silicon occupied by large functional units of execution (and longer registers), and practically does not add value, since there is very little software for clients that can benefit from these instructions.

The main purpose of this instruction set was to speed up tasks related to data compression, image processing, and cryptographic computations . Offering twice the computing power compared to AVX-256, the AVX-512 instruction set offered significant performance improvements, but despite adding twice the complexity, it did not deliver nearly twice the performance.

For all this he had many criticisms, among them Linus Torvalds harshly criticized them:

I hope the AVX-512 dies a painful death, and that Intel starts fixing the real problems instead of trying to create magic instructions and then create benchmarks that can be seen well .

I hope Intel gets back to basics: get their lithography working again, and focus more on regular code that isn’t HPC (high performance computing) or some other nonsensical special case.

I’ve said it before, and I’ll say it again: In the heyday of the x86 architecture, when Intel was laughing all the way to the bank and killing all their competition , absolutely everyone else was doing better than Intel at workloads. FP. Intel’s FP performance sucked (relatively speaking), and it didn’t matter one bit.

Because absolutely no one cares outside of benchmarks.

The same goes for the AVX-512 now and in the future. Yes, you can find things where the AVX-512 matters. No, those things don’t sell equipment in the big picture.

And the AVX-512 has real drawbacks. I’d rather see that transistor budget used on other things that are much more relevant . Even if it’s still FP math (on the GPU, instead of AVX-512). Or just give me more cores (with good single thread performance, but without the crap like the AVX-512) like AMD did .

I want my power limits to be hit with regular integer code, not some AVX-512 power virus that strips my max frequency (because people ended up using it for memcpy!) and removes cores (because those useless drives of garbage take up space).

Yes, yes, I am biased. I absolutely hate FP benchmarks, and I realize that other people really care. I just think the AVX-512 is exactly what not to do. It’s a nuisance to me. It’s a good example of something Intel has done wrong , partly because it has increased market fragmentation.

Drop the special case crap, and make all the ordinary stuff that everyone cares about work as smoothly as you humanly can. Then make an FPU that’s just barely good enough on the side, and people will be happy. AVX2 is much more than enough .

Yes, I’m in a bad mood.

linus

After this, the famous programmer and creator of Linux would have an AMD Threadripper computer for the first time to compile, since he was somewhat disgusted with the Intel that he had always used for his computers…

End of AVX-512? No, just the beginning of a soap opera…

AVX-512 was both a good and a bad idea. Intel went ahead, since there was no software to justify its implementation on the client side, although it was for HPC. AMD was smarter in this regard and chose not to adopt the AVX-512 until there was more software that could take advantage of it, and that moment came in Zen 4, for the current Ryzen 7000 Series.

Intel for its part now seems somewhat lost, since it was the promoter of AVX-512 and has now blocked them for its Alder Lake onwards. It is true that the first Alder Lakes allowed AVX-512 processing on the Golden Cove-based P-cores, but not on the Gracemont-based E-cores. This was somewhat complex for the instruction scheduler, so Intel opted to disable them, even though the cores could physically use them.

Let’s take a look at the clumsy and weird moves Intel made for the AVX-512:

  1. Intel goes to the press to say that AVX-512 would not be compatible with hybrid processors with P and E cores, perhaps because of the bad reputation and criticism received.
  2. Before its release, an optimization guide from Intel itself appeared showing the use of AVX-512 for Alder Lake, which was very disconcerting after the previous message.
  3. That’s why Intel had to come out again before the press, once again denying that the AVX-512 were supported and removed the references from the guide.
  4. It’s time to release Alder Lake, and the first users to try it discover that some motherboards have firmware that allows them to work with AVX-512, something that motherboard manufacturers did against Intel’s will.
  5. Intel remained silent for the world press, although it assured the Taiwanese media that AVX-512 support was still present in Alder Lake, although it was not activated by default. It was only a possibility that the user had to enable it or not.
  6. After this, Intel breaks the news that it disables the AVX-512 with a new firmware or microcode update. In this way he intends to put an end to the controversy, although he did not succeed…
  7. The same week that the update is announced, there are patches for the BIOS/UEFI that disabled AVX-512.
  8. Some motherboard manufacturers, like MSI, find a way around Intel’s lockdown and allow the feature to be enabled from their motherboards’ BIOS/UEFI, as a lure for potential buyers.
  9. Intel sees that the ghost is resurrected again and would finally decide to disable this function by hardware. That is to say, in the first Alder Lake it was physically available and in the Alder Lake released later it is not. Therefore, if you want AVX-512 enabled, then you have to buy Intel Xeons, which are of course more expensive. Marketing or mistake? Judge for yourself…

Meanwhile, AMD is going in a different direction :

  1. While Intel adopted the AVX-512 for client processors as well, for which it was criticized, AMD remained focused on implementing more processing cores, instead of making the existing ones more complex.
  2. When it comes time to design Zen 4, AMD decides that it is a good time to implement these instructions in its AMD Ryzen 7000 Series and EPYC. However, they do not support the entire Intel set, since they do without instructions such as AVX512ER, AVX512PF (Knights Landing), AVX512 4VNNIW, 4FMAPS (Knights Mill) and  (Tiger Lake). Also, it seems that AMD has simplified its execution units compared to Intel.