Deep reinforcement learning for weakly coupled MDP's with continuous actions - Réseaux, Mobiles, Embarqués, Sans fil, Satellites
Communication Dans Un Congrès Année : 2024

Deep reinforcement learning for weakly coupled MDP's with continuous actions

Résumé

This paper introduces the Lagrange Policy for Continuous Actions (LPCA), a reinforcement learning algorithm specifically designed for weakly coupled MDP problems with continuous action spaces. LPCA addresses the chal- lenge of resource constraints dependent on continuous actions by introducing a Lagrange relaxation of the weakly coupled MDP problem within a neural network framework for Q-value computation. This approach effectively decouples the MDP, enabling efficient policy learning in resource-constrained environments. We present two variations of LPCA: LPCA-DE, which utilizes differential evolu- tion for global optimization, and LPCA-Greedy, a method that incrementally and greadily selects actions based on Q-value gradients. Comparative analysis against other state-of-the-art techniques across various settings highlight LPCA’s robust- ness and efficiency in managing resource allocation while maximizing rewards.
Fichier principal
Vignette du fichier
Deep reinforcement learning for weakly coupled MDPs with continuous actions.pdf (388.65 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04594762 , version 1 (30-05-2024)
hal-04594762 , version 2 (11-06-2024)

Licence

Identifiants

Citer

Francisco Robledo, Urtzi Ayesta, Konstantin Avrachenkov. Deep reinforcement learning for weakly coupled MDP's with continuous actions. ACM SIGMETRICS / ASMTA 2024, Jun 2024, Venise, Italy. ⟨hal-04594762v2⟩
613 Consultations
76 Téléchargements

Altmetric

Partager

More