A comparison of tree data structures in the streaming data clustering issue

[ X ]

Tarih

2024

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Gazi Univ, Fac Engineering Architecture

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Processing streaming data is a challenging issue because of the limitation of time and resources. Clustering data streams is an efficient technique to analyze this kind of data. This study proposes two new streaming data clustering algorithms, BT-AR Stream and VP-AR Stream, inspired by the KD-AR Stream clustering algorithm [32]. Our algorithms used Ball-Tree and Vintage Tree data structures instead of KD-Tree. To reveal the efficiency of the proposed algorithms, we tested the algorithms on 18 benchmark datasets in terms of clustering qualities and runtime complexities. Then we compared obtained results with the results of the KD-AR Stream algorithm. According to the results, the BT-AR Stream algorithm was the most successful in terms of clustering quality and runtime complexity, as illustrated in Figure A.Purpose: This study aims to analyze and compare the efficiency of tree data structure in data stream clustering issues. We aim to reveal the efficiency of tree data structures in both clustering quality and runtime performance.Theory and Methods: To compare the efficiency of tree data structures in data stream clustering, we proposed two stream clustering algorithms inspired by KD-AR Stream. For this reason, we used Ball-Tree and Vintage-Tree data structures instead of KD-Tree and proposed two new stream clustering algorithms named BT-AR Stream and VP-AR Stream. To compare the success of algorithms, we tested them on 18 benchmark datasets and compared them in aspects of clustering quality and runtime complexity.Results: According to the results obtained in the experimental study, the BT-AR Stream algorithm, which uses Ball-Tree, was the most successful in both clustering quality and runtime complexity on the KDD, which is a high-dimensional dataset. On the other hand, the clustering quality of all algorithms was good on the other datasets. Conclusion: Although the clustering quality of all three algorithms was good, the BT-AR Stream algorithm was the most successful because KDD is high-dimensional. Furthermore, it is the fastest algorithm compared to the others.

Açıklama

Anahtar Kelimeler

Streaming data, Clustering, Tree data structures

Kaynak

Journal of The Faculty of Engineering and Architecture of Gazi University

WoS Q Değeri

N/A

Scopus Q Değeri

Q2

Cilt

39

Sayı

1

Künye