A novel feature extraction approach for text-based language identification: Binary patterns

Kaya, Yilmaz; Ertugrul, Omer Faruk

A novel feature extraction approach for text-based language identification: Binary patterns

Tarih

2016

Yazarlar

Kaya, Yilmaz

Ertugrul, Omer Faruk

Yayıncı

Gazi Univ, Fac Engineering Architecture

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Language identification (LI), which is a major task in natural language processing, is the process of determining the language from a given content. In this paper, a novel approach, which is based on the probability of the use of the characters that have the similar orders with respect to their UTF-8 values, was proposed. In order to evaluate and validate the proposed approach, four datasets, which contain texts in different numbers of languages, were employed. In the proposed approach, the features that were exacted by one-dimensional local binary pattern (1D-LBP) method were classified by various machine learning methods. Achieved LI accuracies in each of four employed datasets were 86.20%, 92.75%, 100% and 89.77%, respectively. The results showed that the proposed approach yields high success rates and it is an efficient way of language identification.

Anahtar Kelimeler

Text-based language identification, one dimensional local binary patterns, natural language processing, feature extraction

Kaynak

Journal of The Faculty of Engineering and Architecture of Gazi University

WoS Q Değeri

Q4

Scopus Q Değeri

Q2

Cilt

31

Sayı

4

Bağlantı

https://doi.org/10.17341/gazimmfd.278463
https://hdl.handle.net/20.500.12604/7519

Koleksiyon

WOS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

A novel feature extraction approach for text-based language identification: Binary patterns

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon