heattaya.blogg.se - Arabic part of speech tagger

To keep away from error and improve segmentation by utilizing POS data, segmentation and labeling should be possible at the same time.The main goal of developing POS tagger for any Language is to improve accuracy of tagging and remove ambiguity in sentences due to language structure. But, as the Myanmar language's complex morphological structure, the OOV problem still exists. For Myanmar Language, there are also separate word segmentors and POS taggers based on statistical approaches such as Neural Network (NN) and Hidden Markov Models (HMMs). Currently, there are many research efforts in word segmentation and POS tagging developed separately with different methods to get high performance and accuracy. The POS information is also necessary in NLP's preprocessing work applications such as machine translation (MT), information retrieval (IR), etc. In Natural Language Processing (NLP), Word segmentation and Part-of-Speech (POS) tagging are fundamental tasks. The obtained accuracies are 97.6% and 94.4% for respectively our method and for the Rule based tagger method.

The experiment results demonstrate the efficiency of our method for Arabic POS Tagging. To evaluate its accuracy, the proposed method has been trained and tested with the Holy Quran Corpus containing 77 430 terms for undiacritized Classical Arabic language. The proposed technique uses the different contextual information of the words with a variety of the features which are helpful to predict the various POS classes. Our POS tagger generates a set of 4 POS tags: Noun, Verb, Particle, and Quranic Initial (INL). To overcome these two problems, we propose a Hidden Markov Model (HMM) integrated with Arabic Rule-Based method. Arabic Rule-Based method suffers from misclassified and unanalyzed words due to the ambiguity issue. This study proposes a building of an efficient and accurate POS Tagging technique for Arabic language using statistical approach. Part-of-speech (POS) tagger plays an important role in Natural Language Applications like Speech Recognition, Natural Language Parsing, Information Retrieval and Multi Words Term Extraction. We believe that our study makes a significant contribution to the literature because this work is an advancement in the direction of achieving a standard, rich, and comprehensive tagset for Arabic. In addition, the proposed tagset is implemented in a PoS tagger and tested via various experiments. They are based on a comparative study and important references in Arabic grammar they are also validated by experts in this field. These hierarchical levels allow easier expansion when required and produce more accurate and precise results.

This study aims to design detailed hierarchical levels of the Arabic tagset categories and their relationships. Hence, the task of tagging the correct PoS tags requires advanced processing and the use of considerable resources. Further, detecting the difference between Arabic derivatives represents a very challenging issue for the majority of PoS taggers. Consequently, the same word may be spelled in different ways. Determining the PoS tags of a word in a particular context is difficult, primarily because there is no use of diacritics in most of contemporary texts. Part of Speech (PoS) tagging is still not very well investigated with respect to the Arabic language.