ICLR 2026
APPLE Toward General Active Perception Via Reinforcement Learning
Active perception is a fundamental skill that enables us humans to deal with uncertainty in our inherently
partially observable environment.
For senses such as touch, where the information is sparse and local, active perception becomes crucial. In
recent years, active perception has emerged as an important research domain in robotics.
However, current methods are often bound to specific tasks or make strong assumptions, which limit their
generality.
To address this gap, this work introduces APPLE (Active Perception Policy Learning) - a novel framework that
leverages reinforcement learning (RL) to address a range of different active perception problems.
APPLE jointly trains a transformer-based perception module and decision-making policy with a unified
optimization objective, learning how to actively gather information.
By design, APPLE is not limited to a specific task and can, in principle, be applied to a wide range of active
perception problems.
We evaluate two variants of APPLE across different tasks, including tactile exploration problems from the
Tactile MNIST benchmark. Experiments demonstrate the efficacy of APPLE, achieving high accuracies on both
regression and classification tasks.
These findings underscore the potential of APPLE as a versatile and general framework for advancing active
perception in robotics.
TL;DR
APPLE is a reinforcement learning framework for solving active perception
problems.
We evaluate on a range of different tasks, and focus especially on tactile active perception.
Demos: Tasks & Policies
This section provides demos showcasing the capabilities of APPLE policies and baselines on various tasks.
Select a Task and a Policy to view the corresponding rollout video.
Spotlight: APPLE on the Toolbox Environment
Below is a closer look at a learned APPLE-CrossQ policy on the Toolbox environment.
In this environment, the agent has to find the pose of the wrench in the workspace from touch alone.
We can see that APPLE-CrossQ learned an intelligent policy that decomposes into three distinct phases:
First, the agent searches for the wrench in the workspace in a circular search pattern.
Then, it establishes initial contact.
At this stage, the pose of the wrench is still ambiguous to the agent, as it is not clear in which direction
the open end of the wrench points and the agent is also uncertain about its position along the handle.
Hence, finally, the agent slides along the handle to resolve this ambiguity and reports the correct
pose of the wrench.
1) Search. The agent explores the workspace to locate the object.
2) Contact. It establishes initial contact with the tool.
3) Disambiguation. It slides along the tool's handle to resolve uncertainty about its
pose and orientation.
Citation
If you find APPLE useful, please cite:
@inproceedings{schneider2026apple,
title = {APPLE: Toward General Active Perception via Reinforcement Learning},
author = {Schneider, Tim and de Farias, Cristiana and Calandra, Roberto and Chen, Liming and Peters, Jan},
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2026}
}