How a GPU works to generate a graph

Since 3D graphics appeared in computer systems, a revolution has taken place, both in the world of video games and in the world of graphic interfaces, etc. But… do you really know how a GPU works to generate a graphic on the screen? Let’s see the whole process step by step.

Index of contents

What is a GPU?
What is a graphical API?
What is a graphics engine?
How is a graph generated?
- The CPU starts work…
- GPU follows: Process starts in the rendering pipeline
  - Vertex Operation (Per-Vertex Operation)
  - Primitive Assembly
  - Primitive Processing
  - Rasterization or rendering (Rasterization)
  - Fragment Processing
  - Per-Fragment Operation
- Finally, the screen displays the rendered image

Table of Contents

What is a GPU?

The GPU, also known as the graphics processing unit , is a vital component today. Essentially, the GPU is a CPU specialized in processing and sending graphics to the screen. As modern operating systems and multimedia applications have evolved, the GPU has become even more crucial.

In the early days of computer graphics, the CPU was responsible for processing all graphics , resulting in excessive load. However, with the advent of the first 3D graphics accelerators, which gave way to modern graphics cards, lighting, shading, 3D rendering, and other graphic effects were handled by the GPU, reducing the load on the CPU. Notably, the CPU and GPU work together, with the CPU telling the GPU what tasks to perform based on the software it’s running. This is critical to understanding the entire process.

I would like to highlight some additional points that I consider important. When the CPU runs the software and sends the data to the GPU over the system bus , the first thing it encounters is the command processor. This processor is responsible for interpreting the commands that arrive through the API and converting them into a language that the GPU drivers can understand.

Once the data reaches the command processor, the vectors or vertices are received , which are then sent to the geometry units for processing. They are then passed to the fragment generation units and finally to the image units. It is important to note that all of these steps will depend on the GPU architecture, as some units may be different or may not be present on all GPUs.

In the case of integrated GPUs , such as iGPUs found in SoCs and APUs, the process is similar, but they are not dependent on the system bus as they are integrated on the same chip or package. However, it is important to note that these do not have dedicated VRAM, but instead share RAM with the CPU in a unified system.

Today’s GPUs are incredibly fast and capable of displaying images on the screen in real time due to their advanced architecture, highly parallel at the core level, with hundreds or thousands of cores in GPUs such as AMD Radeon, NVIDIA GeForce or Intel Arc. In addition, they employ various techniques to maximize their speed, although this topic is quite complex and deserves its own analysis.

What I want you to understand about parallelism is that processing units are SIMD , which means they can handle multiple pieces of data with a single instruction. With so many cores, when a request is executed, it is distributed among all the cores so that it completes quickly.

In the instructions that are sent to the GPU there are also forks or ramifications, it is a complete ISA , like that of a CPU, but special. These forks can reduce performance, just like the CPU, although this is a separate topic that I won’t cover here. In some cases, not all compute units may finish their process. In addition, as with the CPU, it is also necessary to maintain the consistency of the memory.

What is a graphical API?

It is important to understand what a graphics API is in order to understand the process of generating graphics. These APIs allow the software to send the necessary commands to the GPU to process the graphics or animations.

There are several graphics APIs, such as OpenGL, Vulkan, WebGL or DirectX 3D . These APIs are collections of libraries and commands that allow hardware to create 2D and 3D animations. To stay up to date, they must include all the modern features that current GPUs can support, such as Ray Tracing and more.

Third-party software, such as a video game, can use the features implemented in the API . When something needs to be drawn on the screen, the software will make a call to the necessary API function(s). The API, in turn, will send a command to the graphics card or GPU. However, these commands sent by the API would not be understandable by the GPU without a key piece, which is the graphics driver. The driver will translate the commands to be understandable instructions for the GPU.

What is a graphics engine?

Before we get into the GPU graphics generation process, it is important to understand what a graphics engine or game engine is . Although it may seem similar to a graphics API, it is development software used to create video games, animations, simulations, and digital twins. Some examples of popular graphics engines are Unreal Engine, CryEngine, Godot, Unity 3D, Source, Rockstar Advanced Engine, and DOOM Engine, among others. These graphics engines are used to create video games, both for specific games and for other titles.

The graphics engine is a development tool that sits above the graphics API level, even though you use it to create your graphics. Basically, these engines provide a framework that makes it easy to create charts instead of having to build them from scratch. They include tools such as an integrated development environment (IDE), graphical editors, and 2D or 3D rendering engines.

In short, a graphics engine provides the necessary tools for the content creator to render 2D or 3D graphics, including a physics engine that simulates physical laws (collisions, reflections, waves, etc.), animation, scripting, sound, artificial intelligence for the game, and more.

How is a graph generated?

Now that we’ve got the basics down, now it’s time to look at what actually goes on inside the graphics card or GPU. Compiling all of the above, the graphic stack or hierarchy of elements that intervene in the creation of a computational graph we have:

The software that needs to draw the graph, such as a video game (thanks to the graphics engine) or a graphics window of a program, will make use of the graphics API to draw it.
The API, through its functions or commands, will be able to communicate to the CPU what needs to be processed to generate the desired graph.
The GPU driver or controller will be in charge of translating the instructions so that they are understandable by the GPU, in this way, the software is independent of the GPU architecture, which could change and continue to be compatible.
Finally, the GPU will be dedicated to processing the necessary graphics and loading them into memory so that they can be sent to the screen through the interface in which it is connected.

The process for frames to reach the framebuffer and be displayed on the screen involves a series of stages ranging from the simplest elements to the most elaborate images. In the following sections, I will explain this process in detail.

It’s fascinating how the graphics and characters we see in today’s video games originate from simple points that transform into triangles, more complex polygons, and finally gain texture, color, and lighting in a matter of milliseconds.

The CPU starts work…

The first thing to say is that it will be the software that runs on the CPU to generate these graphics through commands or objects through the graphics API and other necessary elements as I mentioned above.

In this way, the CPU can send to the GPU a series of instructions that need to be executed to present the graphic on the screen and that are achieved thanks to the translation that the GPU driver does . Once they reach the GPU, shader units or shaders will be used, such as AMD’s Compute Unit or NVIDIA’s CUDA cores.

GPU follows: Process starts in the rendering pipeline

Once the necessary data and instructions arrive at the GPU, that is, at the pipeline, what is known as the Rendering Pipeline begins . And this will consist of 6 specific stages:

Different GPUs have different architectures and compute units, you can learn more about GPU architecture here and here about the differences between NVIDIA, AMD and Intel .

Vertex Operation (Per-Vertex Operation)

Once the process starts on the GPU, the vertices (Vertex Shader) will be processed, which will be executed through a series of floating point mathematical operations by the shader units. By means of matrix multipliers, a transformation will be carried out to establish the coordinates of the image to be represented.

Primitive Assembly

The next thing will be to assemble the primitives or 3 vertices that have been obtained in the previous step. In this way triangles are created, with many of these triangles it will be possible to generate an image similar to these mountains represented only with triangles:

Primitive Processing

The next stage that occurs in the GPU pipeline is what is known as clipping . As you can understand, not the whole image is displayed on the screen, but only a part of the scene will be seen on the screen, so everything outside the screen must be cropped (the screen space is called View- Volume). This way, hardware resources will not be wasted in the following stages by having to process parts of the graph that will not serve any purpose.

Rasterization or rendering (Rasterization)

Now that we have the vertices for the triangulation and just the clipping to display, we need to create pixels for the frames . The GPU compute units will take care of this again, converting everything inside the polygons to pixels.

Fragment Processing

In this step, the GPU, through its shader units or a shader called Fragment Shader, will apply the necessary color, texture, lighting , etc., to the pixels obtained in the previous step. Of course, other tasks such as Ray Tracing, anti-aliasing, etc. are also carried out in modern graphics cards.

Per-Fragment Operation

Now the pixels of the graphic that you want to generate already have their color, texture, position, and everything you need. It is the moment in which the GPU saves it in the framebuffer or frame buffer . Specifically, the frames created will be grouped in what is called Default-Framebuffer.

For some analog interfaces, such as the old VGA, DVI-A, RGA, S-Video, Composite Video, etc., an additional step was needed to transform the framebuffer data, i.e. frames, into an analog signal. For this, an additional chip called RAMDAC (Radom Access Memory Digital-to-Analog Converter) was used, a converter to pass the digital data that comes out of the GPU into analog to be interpreted by the old screens. Currently, the interfaces are digital, so no such conversion is necessary. They can go directly from the buffer to the monitor.

Finally, the screen displays the rendered image

And, to finish the process, once we have the frames in the buffer, these frames will pass at a certain speed (FPS) to the bus or interface through which the screen is connected to the graphics card so that it refreshes the image. All this process happens in a matter of very little time, since when you perform an action, you instantly have the graph on the screen.