Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits - Apprentissage de modèles visuels à partir de données massives Access content directly
Preprints, Working Papers, ... (Preprint) Year : 2024

Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits

Abstract

We address the problem of stochastic combinatorial semi-bandits, where a player can select from P subsets of a set containing d base items. Most existing algorithms (e.g. CUCB, ESCB, OLS-UCB) require prior knowledge on the reward distribution, like an upper bound on a sub-Gaussian proxy-variance, which is hard to estimate tightly. In this work, we design a variance-adaptive version of OLS-UCB, relying on an online estimation of the covariance structure. Estimating the coefficients of a covariance matrix is much more manageable in practical settings and results in improved regret upper bounds compared to proxy variance-based algorithms. When covariance coefficients are all non-negative, we show that our approach efficiently leverages the semi-bandit feedback and provably outperforms bandit feedback approaches, not only in exponential regimes where P ≫ d but also when P ≤ d, which is not straightforward from most existing analyses.
Fichier principal
Vignette du fichier
main_hal.pdf (736.67 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-04470568 , version 1 (22-02-2024)

Licence

Attribution

Identifiers

  • HAL Id : hal-04470568 , version 1

Cite

Julien Zhou, Pierre Gaillard, Thibaud Rahier, Houssam Zenati, Julyan Arbel. Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits. 2024. ⟨hal-04470568⟩
83 View
66 Download

Share

Gmail Facebook X LinkedIn More