A novel feature extraction approach for text-based language identification: Binary patterns

dc.contributor.authorKaya, Yilmaz
dc.contributor.authorErtugrul, Omer Faruk
dc.date.accessioned2024-12-24T19:30:24Z
dc.date.available2024-12-24T19:30:24Z
dc.date.issued2016
dc.departmentSiirt Üniversitesi
dc.description.abstractLanguage identification (LI), which is a major task in natural language processing, is the process of determining the language from a given content. In this paper, a novel approach, which is based on the probability of the use of the characters that have the similar orders with respect to their UTF-8 values, was proposed. In order to evaluate and validate the proposed approach, four datasets, which contain texts in different numbers of languages, were employed. In the proposed approach, the features that were exacted by one-dimensional local binary pattern (1D-LBP) method were classified by various machine learning methods. Achieved LI accuracies in each of four employed datasets were 86.20%, 92.75%, 100% and 89.77%, respectively. The results showed that the proposed approach yields high success rates and it is an efficient way of language identification.
dc.identifier.doi10.17341/gazimmfd.278463
dc.identifier.endpage1094
dc.identifier.issn1300-1884
dc.identifier.issn1304-4915
dc.identifier.issue4
dc.identifier.scopus2-s2.0-85015742100
dc.identifier.scopusqualityQ2
dc.identifier.startpage1085
dc.identifier.urihttps://doi.org/10.17341/gazimmfd.278463
dc.identifier.urihttps://hdl.handle.net/20.500.12604/7519
dc.identifier.volume31
dc.identifier.wosWOS:000392927000027
dc.identifier.wosqualityQ4
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isotr
dc.publisherGazi Univ, Fac Engineering Architecture
dc.relation.ispartofJournal of The Faculty of Engineering and Architecture of Gazi University
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_20241222
dc.subjectText-based language identification
dc.subjectone dimensional local binary patterns
dc.subjectnatural language processing
dc.subjectfeature extraction
dc.titleA novel feature extraction approach for text-based language identification: Binary patterns
dc.title.alternativeDoküman Dili tanima için yeni bir öznitelik çikarim yaklasimi: Ikili desenler
dc.typeArticle

Dosyalar