How can language models be taught to program better?

Creating efficient algorithms is crucial in many areas, from reducing power consumption in digital devices to developing next-generation technologies. However, the design of these algorithms remains a difficult task for artificial intelligence systems. In this context, autoplay is presented as a technique that has helped artificial intelligence systems to dominate in games like chess or Go.

In a recent article , published by Adam Tauman Kalai, Senior Principal Researcher, and Patrick Haluptzok, AI Research Associate, a new approach to improving AI programming skills using autoplay is presented. In particular, the authors focus on how language models can be better taught to program through autoplay.

The challenge and the solution

The problem lies in how an artificial intelligence system can generate new algorithmic programming problems without knowing the solution beforehand. The solution proposed in this article involves the use of programming puzzles introduced by Microsoft Research in 2021, which are easy to verify for correctness, but often difficult to solve.

These puzzles are used as a starting point for language models to generate and solve their own programming challenges. This allows practice on millions of artificial challenges and exploration of problem types not found in public repositories.

Examples of programming puzzles for the AI ​​autogame

The authors of the article present three examples of programming puzzles:

Example 1: Tower of Hanoi

The objective of the well-known Towers of Hanoi puzzle is to move all the disks from the first tower to the third tower, one by one, without ever putting a larger disk on top of a smaller one. Although the number of steps required to solve the puzzle is exponential in the number of disks, a solution exists in the form of a short program that is often used to teach recursion.

Example 2: Chain Challenge

This puzzle requires a string with 1000 “A” characters but no two consecutive “A’s”. Most programmers come up with solutions like “ABABAB…” (1000 times), generated by the compact Python program presented in the article.

Example 3: Factorization of integers

This puzzle requires a factor of a relatively small number so that it can be solved quickly with a simple loop. However, the data set also contains much larger and more complex factoring challenges.

Advantages and risks

The autoplay technique presented in this article has several advantages, such as the ability to allow language models to generate and solve their own programming problems. However, there are also risks and limitations associated with this technique, such as the potential for unintended consequences due to increased AI programming capabilities.