CLUE: Calibrated Latent Guidance for Advancing Offline Reinforcement Learning

Offline Reinforcement Learning (RL) presents a compelling approach to training intelligent agents by leveraging pre-collected datasets, thereby circumventing the often costly and time-intensive data acquisition process associated with online RL. However, a significant bottleneck in offline RL lies in the necessity for meticulously defined extrinsic rewards for each data point within the offline dataset. Addressing this labor-intensive reward engineering challenge, a novel method named Calibrated Latent gUidancE (CLUE) has been introduced. CLUE ingeniously incorporates a small amount of expert data to derive intrinsic rewards, effectively eliminating the reliance on hand-crafted extrinsic rewards.

CLUE operates by employing a conditional variational auto-encoder to construct a latent space. This latent space is specifically designed to facilitate the direct quantification of intrinsic rewards. The core innovation of CLUE resides in its ability to align intrinsic rewards with expert intentions. This alignment is achieved by calibrating the embeddings of expert data to a contextual representation that accurately reflects expert knowledge.

The versatility of CLUE is demonstrated through its successful application across various offline RL tasks. It has shown remarkable efficacy in enhancing performance in sparse-reward offline RL scenarios, outperforming current state-of-the-art offline Imitation Learning (IL) methodologies, and facilitating the discovery of diverse skills from static, reward-free offline datasets. Empirical evaluations robustly confirm that CLUE not only improves learning outcomes in challenging RL settings but also opens new avenues for leveraging expert knowledge in offline learning paradigms.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *