Understanding Failures of Deterministic Actor-Critic with Continuous Action Spaces and Sparse Rewards

Guillaume Matheron; Nicolas Perrin; Olivier Sigaud

doi:10.1007/978-3-030-61616-8_25

Communication Dans Un Congrès Année : 2020

Understanding Failures of Deterministic Actor-Critic with Continuous Action Spaces and Sparse Rewards

(1) , (1) , (1)

Guillaume Matheron

Fonction : Auteur

Institut des Systèmes Intelligents et de Robotique

Nicolas Perrin

Fonction : Auteur
PersonId : 741992
IdHAL : nicolas-perrin-gilbert
ORCID : 0000-0001-8626-1938
IdRef : 158235509

Institut des Systèmes Intelligents et de Robotique

Olivier Sigaud

Fonction : Auteur
PersonId : 14932
IdHAL : olivier-sigaud
ORCID : 0000-0002-8544-0229
IdRef : 072724714

Institut des Systèmes Intelligents et de Robotique

Résumé

In environments with continuous state and action spaces, state-of-the-art actor-critic reinforcement learning algorithms can solve very complex problems, yet can also fail in environments that seem trivial, but the reason for such failures is still poorly understood. In this paper, we contribute a formal explanation of these failures in the particular case of sparse reward and deterministic environments. First, using a very elementary control problem, we illustrate that the learning process can get stuck into a fixed point corresponding to a poor solution, especially when the reward is not found very early. Then, generalizing from the studied example, we provide a detailed analysis of the underlying mechanisms which results in a new understanding of one of the convergence regimes of these algorithms.

Mots clés

dissemin

Domaines

Informatique [cs] Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Fichier principal

article.pdf (994.13 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Nicolas Perrin-Gilbert : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03080925

Soumis le : mardi 8 octobre 2024-23:26:46

Dernière modification le : mercredi 30 octobre 2024-13:27:59

Dates et versions

hal-03080925 , version 1 (08-10-2024)

Identifiants

HAL Id : hal-03080925 , version 1
ARXIV : 1911.11679
DOI : 10.1007/978-3-030-61616-8_25

Citer

Guillaume Matheron, Nicolas Perrin, Olivier Sigaud. Understanding Failures of Deterministic Actor-Critic with Continuous Action Spaces and Sparse Rewards. Artificial Neural Networks and Machine Learning – ICANN 2020, Sep 2020, Bratislava, Slovakia. pp.308-320, ⟨10.1007/978-3-030-61616-8_25⟩. ⟨hal-03080925⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS ISIR SORBONNE-UNIVERSITE SU-SCIENCES ANR ISIR_AMAC ISIR_SYROCO

75 Consultations

3 Téléchargements

Understanding Failures of Deterministic Actor-Critic with Continuous Action Spaces and Sparse Rewards

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager