Undocumented instructions: all the secrets

Undocumented instructions , or op_codes , are a type of instructions present in many old and current processors, but they are also completely unknown to many, even being mysterious or suspicious for the most paranoid about security issues. In this article we are going to see what these instructions are and why they are there…

Index of contents

  • What is an instruction?
    • Instruction Types
  • Undocumented instructions (illegal op_code)
    • History

What is an instruction?

An instruction, operation code or op_code , as I have explained so many times, is a binary code that when it reaches the CPU, the control unit will decode it through the decoder to determine which operation is implicit to operate with the data that accompanies it. For example, an instruction can be ADD (addition), SUB (subtraction), MUL (multiplication), DIV (division), AND (logical AND), OR (logical OR), etc., that is, arithmetic operations applied to data integers or floating point, which are the operands.

As you should know, a program is nothing more than an execution of instructions of this type. For example, let’s look at a C source code for a simple program that displays a Hello world:

#include <stdio.h>

int main() {

printf(“¡Hola mundo!”);

return 0;

}

When this code is passed to assembly language , for example that of the x86-64 ISA , we get these instructions that the CPU must process so that this program can be executed:

global _start

 

section .text

 

_start:

mov rax, 1        ; escribe

mov rdi, 1        ; en la salida estándar (pantalla)

mov rsi, msg      ; el mensaje Hola mundo.

mov rdx, msglen   ; con una determinada longitud

syscall           ; y esta es la llamada al systema o sycall para esta escritura

 

mov rax, 60       ; finalmente

mov rdi, 0        ; se carga el valor

syscall           ; para la syscall que finaliza el programa

 

section .rodata

msg: db “¡Hola mundo!”, 12

msglen: equ $ – msg

As you can see, what the CPU would have to execute for this program to work is a series of MOV instructions, which are data movements to registers, and the syscall instruction, which on x86-64 corresponds to the system call, although on x86 more old, you can use the int instruction. On the other hand are the data, which in this program are a series of constants and the message Hello world…

These op_codes or instructions are those that are defined in the ISA of the CPU, that is, in the architecture. Therefore, if the ISA changes, so must the assembler code to make it understandable by the CPU. The assembly code is nothing more than a series of mnemonics to represent the instructions of the ISA.

Once this code is passed to binary , that is, it becomes machine code or ones and zeros, it is already understandable by the CPU and the electronics.

It can be loaded into RAM as a process by the operating system when the executable binary is executed and from there the CPU will begin to access the instructions and data that make up this program . It will do it sequentially, that is, in order, instruction by instruction.

When, for example, the first MOV instruction arrives at the CPU, it will first go to the control unit, specifically to the decoder . This one, thanks to the microcode, will know how to decode the instruction, which represents a movement of data to the register in this case.

Once decoded, the control unit will know that it is a movement instruction, so it will generate control signals that will tell the execution units what to do. In this case the movement But the same if it were an addition, a multiplication, etc. This is how the foundation of computing works.

That is, the MOV instruction is an op_code (type: b8 01 00 00 00) that corresponds in x86 to a series of binary codes, and MOV RAX, 1, what it will do is record a constant, in this case 1, when CPU RAX register. When this code arrives, the decoder will look in the microcode ROM to get the interpretation.

And why am I telling you this? Very simple, I wanted you to understand this to know what an op_code or instruction is, and how it works.

Instruction Types

As you can deduce, there are op_codes or instructions of various types, so that the CPU can execute any program. In the example above we have used just a hello a lot, for which MOV instructions and syscall are sufficient. For example, let’s look at this other C code to add two integers:

#include <stdio.h>

int main() {

 

int num1, num2, suma;

 

printf(“Introduce dos números enteros: “);

scanf(“%d %d”, &num1, &num2);

 

// calcula la suma

suma = num1 + num2;

 

//Muestra el resultado

printf(“%d + %d = %d”, num1, num2, suma);

return 0;

}

The assembler for this would be this other code:

.LC0:

 

.string”Enter two integers: ”

 

.LC1:

 

.string”%d %d”

 

.LC2:

 

.string”%d + %d = %d”

 

main:

 

push rbp

 

mov rbp, rsp

 

sub rsp, 16

 

mov edi, OFFSETFLAT:.LC0

 

mov eax, 0

 

call print f

 

read rdx, [rbp-12]

 

read rax, [rbp-8]

 

mov rsi, rax

 

mov edi, OFFSETFLAT:.LC1

 

mov eax, 0

 

call__isoc99_scanf

 

mov edx, DWORDPTR [rbp-8]

 

mov eax, DWORDPTR [rbp-12]

 

add eax, edx

 

mov DWORDPTR [rbp-4], eax

 

mov edx, DWORDPTR [rbp-12]

 

mov eax, DWORDPTR [rbp-8]

 

mov ecx, DWORDPTR [rbp-4]

 

mov esi, eax

 

mov edi, OFFSETFLAT:.LC2

 

mov eax, 0

 

call print f

 

mov eax, 0

 

leave

 

ret

As you can see, in this other program other instructions are also used, such as ADD EAX, EDX, that is, the values ​​of num1 and num2 that have been previously loaded into these registers.

What I intend to tell you with this is that the instructions are not only for data movement, we can also find types such as:

    • Data transfer: they allow data to be moved from one register to another, from one address to another, to load a constant or variable in a register, etc. This is the case of the MOV that we have seen previously.
    • Arithmetic:they are instructions or op_codes that, once decoded, are translated into signals that will tell the ALU that it has to perform an arithmetic operation with the operands. For example, of this type would be ADD, SUB, MUL, DIV, etc. ADD EAX, EDX in the example above, will add the contents of the EAX register with that of EDX and save the result in EAX again.
    • Logical:of course, there are also instructions to perform logical operations on bits, such as AND, OR, NOT, XOR, etc.
    • Control:another type of important instructions are those that can alter the PC register, that is, the Program Counter register of the CPU. Before I said that the CPU will process the instructions of the program in order, it is sequential. To do this, it has a PC register in which 1 is added to the address of the current instruction so that it points to the next one. Altering this register may be necessary when the program requires that this sequential order not be followed, such as conditional jumps, loops, system calls, etc. For example, some cases of this type of instructions would be SYSCALL, JMP, RET, etc.
    • I/O: finally, we also have the instructions for input and output operations, which will act as if they were memory addresses to be read or written in, but which will go to the mapped peripherals. For example, IN, OUT,… followed by the port number.

You will also be interested in knowing which are the best CPUs on the market . Do they have these kinds of instructions? Do you dare to find out?

Undocumented instructions (illegal op_code)

An illegal op_code, or undocumented instruction, or unwanted instruction , is an instruction that the CPU manufacturer has not documented for use by programmers. And the reason for this to happen can be several:

  • That the designer of the CPU is also unaware of its existence and that it was simply an implicit code in the ISA by mistake. In these cases it could cause erratic or unwanted operation if used. If the CPU has been well designed, it should throw an exception or fault condition if it tries to execute one of these undocumented instructions.
  • In some other cases, it may be that the designer has consciously included them for certain specific tasks that cannot be directly exploited by programmers, that is, they are opaque instructions for developers, and that can be used, for example, to speed up certain tasks. tasks.
  • Another case could be that they are instructions that will not be used during the normal use of the CPU, but that are integrated to do certain tests during the verification stages of sample engineering, etc.
  • In some cases it could be to prevent or hinder reverse engineering.
  • And it could even be for much more obscure reasons, such as being able to execute them with certain codes in order to perform malicious tasks.

In current CPUs, if there was a bug, it could be corrected by a microcode or firmware patch. Before it was more difficult with ROMs or hardwired control units, where you just couldn’t do anything…

Be that as it may, these illegal instructions are more popular than many believe.

History

For example, one of the first cases of illegal op_codes detected was the Intel 8086, the Zilog Z80, the Texas Instruments TM9900, or the MOS Technology 6502 from the 1970s. But they were not only present in these old CPUs, but also in others current.

Many users have discovered this type of code by performing fuzzing techniques , seeing that there are some undocumented instructions in many CPU models. A very current case that we are discussing is that of the Apple AMX instructions .

 

by Abdullah Sam
I’m a teacher, researcher and writer. I write about study subjects to improve the learning of college and university students. I write top Quality study notes Mostly, Tech, Games, Education, And Solutions/Tips and Tricks. I am a person who helps students to acquire knowledge, competence or virtue.

Leave a Comment