Leveraging an Isolation Forest to Anomaly Detection and Data Clustering - Pôle Systèmes Humains-Machines Access content directly
Journal Articles Data and Knowledge Engineering Year : 2024

Leveraging an Isolation Forest to Anomaly Detection and Data Clustering

Abstract

Understanding why some points in a data set are considered as anomalies cannot be done without taking into account the structure of the regular points. Whereas many machine learning methods are dedicated to the identification of anomalies on one side, or to the identification of the data inner-structure on the other side, a solution is introduced to answers these two tasks using a same data model, a variant of an isolation forest. The initial algorithm to construct an isolation forest is indeed revisited to preserve the data inner structure without affecting the efficiency of the outlier detection. Experiments conducted both on synthetic and real-world data sets show that, in addition to improving the detection of abnormal data points, the proposed variant of isolation forest allows for a reconstruction of the subspaces of high density. Therefore, the former can serve as a basis for a unified approach to detect global and local anomalies, which is a necessary condition to then provide users with informative descriptions of the data.
Fichier principal
Vignette du fichier
main.pdf (711.45 Ko) Télécharger le fichier
Origin Files produced by the author(s)

Dates and versions

hal-04516593 , version 1 (22-03-2024)

Identifiers

Cite

Véronne Yepmo, Grégory Smits, Marie-Jeanne Lesot, Olivier Pivert. Leveraging an Isolation Forest to Anomaly Detection and Data Clustering. Data and Knowledge Engineering, 2024, 151, pp.102302. ⟨10.1016/j.datak.2024.102302⟩. ⟨hal-04516593⟩
31 View
45 Download

Altmetric

Share

Gmail Mastodon Facebook X LinkedIn More