Employing Natural Language Processing Techniques for the Development of a Voting-Based POS Tagger in the Urdu Language

Ahmed Raza, Usama Ahmed, Kainat Saleem, Muhammad Sarwar, Momina Shaheen, Muhammad Sohail Farooq

    Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

    Abstract

    The process of sequence labeling (POS) by assigning syntactic tags to words in the given context is an important role in various NLP applications. The core motive of this work is to tackle the morpho-syntactic category of words in Urdu language. This language has lots of computational challenges because of its dual nature. The work comprises different tasks as initially the authors tracked the best combination of feature sets in terms of CRF to entitle the previous results on two stable and well-known datasets Bushra Jawaid dataset and CLE dataset. Due to syntactic ambiguity, a state-of-the-art voting method has been introduced which is being implemented to overcome the contradictory results of the different machine learning classifiers. The results show significant improvement in the baseline results as the F1-score on a primary dataset is 94.8% and 95.7% on the succeeding dataset. Long short-term memory (LSTM) is used for one of the most diverse and inflectional tasks like part of speech tagging for the Urdu language by achieving an F1-score of 86.7% and 96.1% respectively for both datasets.
    Original languageEnglish
    Title of host publicationInnovations in Optimization and Machine Learning
    Pages23-46
    Number of pages24
    DOIs
    Publication statusPublished - 17 Jan 2025

    Publication series

    NameAdvances in Computational Intelligence and Robotics
    PublisherIGI Global
    ISSN (Print)2327-0411

    Cite this