Understanding Failures of Deterministic Actor-Critic with Continuous Action Spaces and Sparse Rewards - Systèmes robotiques Conception et Commande
Communication Dans Un Congrès Année : 2020

Understanding Failures of Deterministic Actor-Critic with Continuous Action Spaces and Sparse Rewards

Résumé

In environments with continuous state and action spaces, state-of-the-art actor-critic reinforcement learning algorithms can solve very complex problems, yet can also fail in environments that seem trivial, but the reason for such failures is still poorly understood. In this paper, we contribute a formal explanation of these failures in the particular case of sparse reward and deterministic environments. First, using a very elementary control problem, we illustrate that the learning process can get stuck into a fixed point corresponding to a poor solution, especially when the reward is not found very early. Then, generalizing from the studied example, we provide a detailed analysis of the underlying mechanisms which results in a new understanding of one of the convergence regimes of these algorithms.

Mots clés

Fichier principal
Vignette du fichier
article.pdf (994.13 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03080925 , version 1 (08-10-2024)

Identifiants

Citer

Guillaume Matheron, Nicolas Perrin, Olivier Sigaud. Understanding Failures of Deterministic Actor-Critic with Continuous Action Spaces and Sparse Rewards. Artificial Neural Networks and Machine Learning – ICANN 2020, Sep 2020, Bratislava, Slovakia. pp.308-320, ⟨10.1007/978-3-030-61616-8_25⟩. ⟨hal-03080925⟩
75 Consultations
3 Téléchargements

Altmetric

Partager

More