APPLE

Active Perception Policy Learning
ICLR 2026

APPLE Toward General Active Perception Via Reinforcement Learning

Active perception is a fundamental skill that enables us humans to deal with uncertainty in our inherently partially observable environment. For senses such as touch, where the information is sparse and local, active perception becomes crucial. In recent years, active perception has emerged as an important research domain in robotics. However, current methods are often bound to specific tasks or make strong assumptions, which limit their generality. To address this gap, this work introduces APPLE (Active Perception Policy Learning) - a novel framework that leverages reinforcement learning (RL) to address a range of different active perception problems. APPLE jointly trains a transformer-based perception module and decision-making policy with a unified optimization objective, learning how to actively gather information. By design, APPLE is not limited to a specific task and can, in principle, be applied to a wide range of active perception problems. We evaluate two variants of APPLE across different tasks, including tactile exploration problems from the Tactile MNIST benchmark. Experiments demonstrate the efficacy of APPLE, achieving high accuracies on both regression and classification tasks. These findings underscore the potential of APPLE as a versatile and general framework for advancing active perception in robotics.

TL;DR
APPLE is a reinforcement learning framework for solving active perception problems. We evaluate on a range of different tasks, and focus especially on tactile active perception.

Demos: Tasks & Policies

This section provides demos showcasing the capabilities of APPLE policies and baselines on various tasks.

Select a Task and a Policy to view the corresponding rollout video.

Spotlight: APPLE on the Toolbox Environment

Below is a closer look at a learned APPLE-CrossQ policy on the Toolbox environment. In this environment, the agent has to find the pose of the wrench in the workspace from touch alone. We can see that APPLE-CrossQ learned an intelligent policy that decomposes into three distinct phases: First, the agent searches for the wrench in the workspace in a circular search pattern. Then, it establishes initial contact. At this stage, the pose of the wrench is still ambiguous to the agent, as it is not clear in which direction the open end of the wrench points and the agent is also uncertain about its position along the handle. Hence, finally, the agent slides along the handle to resolve this ambiguity and reports the correct pose of the wrench.

1) Search. The agent explores the workspace to locate the object.
2) Contact. It establishes initial contact with the tool.
3) Disambiguation. It slides along the tool's handle to resolve uncertainty about its pose and orientation.

Video

Citation

If you find APPLE useful, please cite:

@inproceedings{schneider2026apple,
  title     = {APPLE: Toward General Active Perception via Reinforcement Learning},
  author    = {Schneider, Tim and de Farias, Cristiana and Calandra, Roberto and Chen, Liming and Peters, Jan},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2026}
}