Reservoir Pattern Sampling in Data Streams
Résumé
Many applications generate data streams where online analysis needs are essential. In this context, pattern mining is a complex task because it requires access to all data observations. To overcome this problem, the state-of-the-art methods maintain a data sample or a compact data structure retaining only recent information on the main patterns. This paper addresses online pattern discovery in data streams based on pattern sampling techniques. Benefiting from reservoir sampling, we propose a generic algorithm, named ResPat, that uses a limited memory space and that integrates a wide spectrum of temporal biases simulating landmark window, sliding window or exponential damped window. For these three window models, we provide fast damping optimizations and we study their temporal complexity. Experiments show that the performance of ResPat algorithms is particularly good. Finally, we illustrate the interest of our approach with online outlier detection in data streams.
Domaines
Intelligence artificielle [cs.AI]
Origine : Fichiers produits par l'(les) auteur(s)