Unsupervised Discretization of Continuous Variables in a Chicken Egg Quality Traits Dataset
[ X ]
Tarih
2017
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Figen Yıldız
Erişim Hakkı
info:eu-repo/semantics/openAccess
Özet
Discretization is a data pre-processing task transforming continuous variables into
discrete ones in order to apply some data mining algorithms such as association rules
extraction and classification trees. In this study we empirically compared the
performances of equal width intervals (EWI), equal frequency intervals (EFI) and Kmeans clustering (KMC) methods to discretize 14 continuous variables in a chicken egg
quality traits dataset. We revealed that these unsupervised discretization methods can
decrease the training error rates and increase the test accuracies of the classification tree
models. By comparing the training errors and test accuracies of the model applied with
C5.0 classification tree algorithm we also found that EWI, EFI and KMC methods
produced the more or less similar results. Among the rules used for estimating the
number of intervals, the Rice rule gave the best result with EWI but not with EFI. It was
also found that Freedman-Diaconis rule with EFI and Doane rule with EFI and EWI
slightly performed better than the other rules
Açıklama
Anahtar Kelimeler
Data preprocessing, Discretization, Unsupervised discretization, Egg quality traits, Classification trees
Kaynak
Turkish Journal of Agriculture - Food Science and Technology
WoS Q Değeri
Scopus Q Değeri
Cilt
5
Sayı
4
Künye
Cebeci Z., Yildiz F. (2017). Unsupervised Discretization of Continuous Variables in a Chicken Egg Quality Traits Dataset Turkish Journal of Agriculture - Food Science and Technology, vol.5, no.4, pp.315-320.