A New Feature Selection Metric Based on Rough Sets and Information Gain in Text Classification

dc.contributor.authorÇekik, Rasim
dc.contributor.authorKaya, Mahmut
dc.date.accessioned2024-12-24T19:18:22Z
dc.date.available2024-12-24T19:18:22Z
dc.date.issued2023
dc.departmentSiirt Üniversitesi
dc.description.abstractIn text classification, taking words in text documents as features creates a very high dimensional feature space. This is known as the high dimensionality problem in text classification. The most common and effective way to solve this problem is to select an ideal subset of features using a feature selection approach. In this paper, a new feature selection approach called Rough Information Gain (RIG) is presented as a solution to the high dimensionality problem. Rough Information Gain extracts hidden and meaningful patterns in text data with the help of Rough Sets and computes a score value based on these patterns. The proposed approach utilizes the selection strategy of the Information Gain Selection (IG) approach when pattern extraction is completely uncertain. To demonstrate the performance of the Rough Information Gain in the experimental studies, the Micro-F1 success metric is used to compare with Information Gain Selection (IG), Chi-Square (CHI2), Gini Coefficient (GI), Discriminative Feature Selector (DFS) approaches. The proposed Rough Information Gain approach outperforms the other methods in terms of performance, according to the results.
dc.identifier.doi10.54287/gujsa.1379024
dc.identifier.endpage486
dc.identifier.issn2147-9542
dc.identifier.issue4
dc.identifier.startpage472
dc.identifier.trdizinid1218231
dc.identifier.urihttps://doi.org/10.54287/gujsa.1379024
dc.identifier.urihttps://search.trdizin.gov.tr/tr/yayin/detay/1218231
dc.identifier.urihttps://hdl.handle.net/20.500.12604/5100
dc.identifier.volume10
dc.indekslendigikaynakTR-Dizin
dc.language.isoen
dc.relation.ispartofGazi University Journal of Science Part A: Engineering and Innovation
dc.relation.publicationcategoryMakale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_20241222
dc.subjectDimensionality Reduction
dc.subjectFeature Selection
dc.subjectText Classification
dc.subjectRough Set
dc.titleA New Feature Selection Metric Based on Rough Sets and Information Gain in Text Classification
dc.typeArticle

Dosyalar