Nested Named Entity Recognition in Punjabi Text Kaur Amandeep1,*, Josan Gurpreet Singh2 1Research Scholar, Department of Computer Engineering, Punjabi University, Patiala, Punjab, India 2Assistant Professor, Department of Computer Science, Punjabi University, Patiala, Punjab, India *Corresponding author E-mail id: amandhillon83@yahoo.co.in
Abstract Nested named entities are very useful in named entity recognition (NER) research as they help in identifying entity relationships and internal semantics of entities. But still the recognition of nested structures has been highly ignored in NER research. This paper presents nested NER research conducted for Punjabi language. As there is no standardised nested named entity (NE) tagset defined in literature so a nested NE tagset comprising of 22 nested cases have been formulated from the annotated corpus prepared for Punjabi NER research. This annotated corpus was re-annotated with nested NEs using proposed nested tagset with a joined label tagging and various experiments have been conducted using different feature combinations. The feature set that has shown the highest f-score value of 91.30% consists of context word window 7, POS (parts of speech) information, length of word, digits information, prefixes and suffixes, gazetteers, and context patterns as features. Top Keywords Nested Named Entities, Nested named entity recognition, Nested named entity tagset, Joined label tagging, Conditional random fields, Punjabi language, Language Independent and Dependent Features. Top |