Automatic Detection of Writing Proficiency Levels of Pakistani University Students Using NLP Techniques
DOI:
https://doi.org/10.63056/academia.5.2(a).2026.1872Keywords:
Natural Language Processing, Writing Proficiency, Machine Learning, Automated Essay Scoring, Text Analytics, PakistanAbstract
The paper will introduce a clever solution to the automatic recognition of writing proficiency among the Pakistani university students through Natural Language Processing (NLP) and machine learning. A quantitative experimental design was used, with a dataset of undergraduate essays being preprocessed with the help of tokenization, stop-word detection, and lemmatization. To obtain linguistic patterns, textual features were obtained by Term Frequency Inverse Document Frequency (TF-IDF) analysis and n-gram analysis. Three classification models Support Vector Machine (SVM), Logistic Regression and Naive Bayes have been trained and tested with the common performance measures. The results indicate that each of the models is successful in categorizing the levels of writing proficiency, but SVM is the best because of its strength to process high-dimensional textual data. Moreover, vocabulary richness, grammatical accuracy, and syntactic complexity proved to be important predictors of writing quality. The research paper identifies the promise in NLP-based systems to provide a high-quality, reliable, and scalable system of automated writing evaluation in higher education.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Dr. Abdul Khaliq, Khizar Mumtaz, Ali Abbas (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.







