Yazar "Senol, Ali" seçeneğine göre listele
Listeleniyor 1 - 2 / 2
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe A comparison of tree data structures in the streaming data clustering issue(Gazi Univ, Fac Engineering Architecture, 2024) Senol, Ali; Kaya, Mahmut; Canbay, YavuzProcessing streaming data is a challenging issue because of the limitation of time and resources. Clustering data streams is an efficient technique to analyze this kind of data. This study proposes two new streaming data clustering algorithms, BT-AR Stream and VP-AR Stream, inspired by the KD-AR Stream clustering algorithm [32]. Our algorithms used Ball-Tree and Vintage Tree data structures instead of KD-Tree. To reveal the efficiency of the proposed algorithms, we tested the algorithms on 18 benchmark datasets in terms of clustering qualities and runtime complexities. Then we compared obtained results with the results of the KD-AR Stream algorithm. According to the results, the BT-AR Stream algorithm was the most successful in terms of clustering quality and runtime complexity, as illustrated in Figure A.Purpose: This study aims to analyze and compare the efficiency of tree data structure in data stream clustering issues. We aim to reveal the efficiency of tree data structures in both clustering quality and runtime performance.Theory and Methods: To compare the efficiency of tree data structures in data stream clustering, we proposed two stream clustering algorithms inspired by KD-AR Stream. For this reason, we used Ball-Tree and Vintage-Tree data structures instead of KD-Tree and proposed two new stream clustering algorithms named BT-AR Stream and VP-AR Stream. To compare the success of algorithms, we tested them on 18 benchmark datasets and compared them in aspects of clustering quality and runtime complexity.Results: According to the results obtained in the experimental study, the BT-AR Stream algorithm, which uses Ball-Tree, was the most successful in both clustering quality and runtime complexity on the KDD, which is a high-dimensional dataset. On the other hand, the clustering quality of all algorithms was good on the other datasets. Conclusion: Although the clustering quality of all three algorithms was good, the BT-AR Stream algorithm was the most successful because KDD is high-dimensional. Furthermore, it is the fastest algorithm compared to the others.Öğe MCMSTStream: applying minimum spanning tree to KD-tree-based micro-clusters to define arbitrary-shaped clusters in streaming data(Springer London Ltd, 2024) Erdinc, Berfin; Kaya, Mahmut; Senol, AliStream clustering has emerged as a vital area for processing streaming data in real-time, facilitating the extraction of meaningful information. While efficient approaches for defining and updating clusters based on similarity criteria have been proposed, outliers and noisy data within stream clustering areas pose a significant threat to the overall performance of clustering algorithms. Moreover, the limitation of existing methods in generating non-spherical clusters underscores the need for improved clustering quality. As a new methodology, we propose a new stream clustering approach, MCMSTStream, to overcome the abovementioned challenges. The algorithm applies MST to micro-clusters defined by using the KD-Tree data structure to define macro-clusters. MCMSTStream is robust against outliers and noisy data and has the ability to define clusters with arbitrary shapes. Furthermore, the proposed algorithm exhibits notable speed and can handling high-dimensional data. ARI and Purity indices are used to prove the clustering success of the MCMSTStream. The evaluation results reveal the superior performance of MCMSTStream compared to state-of-the-art stream clustering algorithms such as DenStream, DBSTREAM, and KD-AR Stream. The proposed method obtained a Purity value of 0.9780 and an ARI value of 0.7509, the highest scores for the KDD dataset. In the other 11 datasets, it obtained much higher results than its competitors. As a result, the proposed method is an effective stream clustering algorithm on datasets with outliers, high-dimensional, and arbitrary-shaped clusters. In addition, its runtime performance is also quite reasonable.