A Relative Analysis on Machine Learning Approaches for Effective POS Tagging of Tamil Language Sheshasaayee Ananthi1,**, Angela Deepa V.R.2,* 1Research Supervisor, PG and Research, Department of Computer Science, Quaid-E- Millath Government College for Women (Autonomous), Chennai, 600 002, Tamil Nadu, India 2Research Scholar, PG and Research, Department of Computer Science, Quaid-E- Millath Government College for Women (Autonomous), Chennai, 600 002, Tamil Nadu, India *Corresponding author E-mail id: angelrajan.research@gmail.com
**ananthi.research@gmail.com
Abstract The process of identifying a suitable tag for each word in a document which articulates an analogous meaning in a particular context is termed as part-of-speech (POS). This process plays a key role in building an effective natural language processing (NLP) application. Morphological complexity and the varying grammatical constructs lead to a variety of approaches for tagging. For a highly agglutinative language like Tamil different approaches have been used for POS tagging, which include rule-based, stochastic or transformation-based learning approaches. This article deals with memory-based language processing (MBLP), a novice approach to NLP based on a symbolic machine learning method termed as memory-based learning (MBL). MBLP is like a support vector machine (SVM) in which the approach is language processing based on the idea guided by the direct reuse of memory traces of earlier language experiences rather than by rules extracted from such experiences. This article reflects the scope of differences that narrate the new way of dealing with taggers in Tamil language through a comparative study of the MBLP and SVM used in languages like Dutch and Malayalam. Top Keywords Annotated corpora, Machine learning, Parts-of-speech, Memory-based language processing (MBLP), Support vector machine (SVM), Tagging, Agglutinative. Top |