CAFÉ
101 N. Atherton Building
101 N. Atherton
State College, PA
Title: Reinforcement Learning with Linear Temporal Logic Specifications
Speaker: Abhinav Verma
Abstract:
Specifications in linear temporal logic (LTL) offer a simplified way of specifying tasks for policy optimization that may otherwise be difficult to describe with scalar reward functions. However, the standard RL frameworks can be too myopic to find maximally satisfying policies. In this talk we will discuss eventual discounting, a value-function based proxy under which one can find policies that satisfy a specification with the highest achievable probability. To improve the efficiency of learning from specifications we combine eventual discounting with LTL-guided Counterfactual Experience Replay, a method for generating off-policy data from on-policy rollouts via counterfactual reasoning. Finally, we will discuss a mechanism for exploiting the compositionality of a specification to provide formal guarantees on the behavior of learnt policies for reach-avoid tasks.
Please RSVP if you plan to attend.