How to learn from interaction with users

Have you ever wondered how machines learn to perform tasks? How can recommendation algorithms know what products to suggest to you without even asking you directly? Well, this is due to reinforcement learning, a type of machine learning that uses rewards to guide models toward valuable behaviors. But how are these rewards defined? And what happens when the rewards are not explicit or easily measurable?

How to learn from interaction with users

This is the problem addressed by research presented at the 2023 International Conference on Learning Representations (ICLR), which proposes a novel approach to inferred reward learning. Instead of relying on explicit rewards, an approach called Interaction Based Learning (IGL) allows models to infer latent rewards through interaction with the environment and users.

Reinforcement learning and the complexity of defining rewards

Before delving into the IGL approach, it is important to understand the concept of reinforcement learning. In reinforcement learning, an agent (a model) learns to perform a task by interacting with an environment and receiving rewards (or punishments) for its actions. The agent learns to maximize the total reward over time, which means that he learns to take actions that lead to more valuable rewards.

However, the definition of rewards can be very complex. For example, in a recommendation system, how is the reward defined? Is it simply the fact that a user clicks on a suggested product? Or should you take into account factors such as the user’s long-term satisfaction with the product? How is long-term satisfaction defined and measured? These are difficult questions to answer, and the complexity only increases when there are differences in how users interact with the system.

Interaction Based Learning (IGL)

This is where the IGL approach comes in. Instead of relying on explicit rewards, IGL uses more subtle feedback signals, such as user clicks, to infer a latent reward. This reward is not directly observable, but can be inferred from the feedback signals.

How to learn from interaction with users

The IGL approach proposed in the research presented at ICLR 2023 is IGL-P, which is the first IGL approach for context-dependent feedback, the first time inverse kinematics has been used as an IGL target, and the first time it has been used the IGL approach for more than two latencies. IGL-P learns to maximize this latent reward through interaction with the environment and users, allowing for more effective personalization of the user experience.


The IGL approach is particularly useful for interactive learning applications, such as recommender systems. Recommender systems are used to provide personalized content suggestions to users based on their interaction history. However, it is difficult to define an explicit reward for recommender systems. Therefore, modern recommender systems use implicit feedback signals, such as user clicks, to infer whether a user is satisfied with the recommendation.

However, these implicit signals are not always an accurate measure of user satisfaction. For example, a user might click on an article just because the headline is sensational, but then discover that the content is not relevant or quality. This could lead to a poor user experience and damage the reputation of the recommender system.


IGL-P provides a scalable and effective solution for personalized agent learning, which can significantly improve the user experience. IGL-P learns custom reward functions for different users instead of relying on a fixed, human-designed reward function. It learns from varied feedback signals and uses inverse kinematics to infer the underlying latent reward.

The effectiveness of IGL-P was demonstrated in experiments using simulations and real world production traces. The results showed that IGL-P can learn to distinguish between different modes of user communication, and can significantly improve the effectiveness of recommender systems by learning to maximize the underlying latent reward. IGL-P has also been shown to outperform current custom agent learning methods that require high-dimensional parameter calibration, human-designed rewards, or extensive and expensive user studies.

Research presented at ICLR 2023 highlights the importance of reinforcement learning in machine learning, as well as the complexity of defining rewards. Using the IGL approach provides an effective solution for personalized agent learning in interactive learning applications, such as recommender systems. IGL-P represents a significant improvement on current methods of learning custom agents, which can significantly improve the user experience.


Leave a Comment