• Waqas Azeem Faculty of Computer science University: Government college university Faisalabad
  • Chakir Aziza Faculty of Law, Economics and Social Sciences, Hassan II University, Casablanca, Maroc




Hybrid Roman Urdu, Data Mining, Feature Extractions, Text Classification, Sentimental Analysis


Text classification is the task of assigning labels to unlabeled text data. Text classification has several applications like sentiment analysis, document classification, and fake news detection such as Machine learning (ML) methods have been used commonly in text classification in the last several years. The fundamental problem in ML is that these approaches heavily depend on feature selection methods. The models and feature selection methods used in this research. Several past types of research conclude that there is no uniform feature selection method that works well for all types of classifier tasks as well as Urdu is a resource-poor language. In this study, a proposed hybrid feature selection approach for Roman Urdu text not only reduces the dimension of the feature map but also increases the accuracy of ML models. Using 11000 and 20000 records have been used for Support Vector Classifier, Naive Base and Decision Tree which have given 80.81%, 72.94% and 76.78% respectively, among other tested methods. The best accuracy values achieved by each classifier and the hybrid features ChiSAE, CorrelationAE, and GainRAE. In future, text classification for better understanding of human being self-analysis as well as deep learning methods will be utilized for better authenticity.


Kadhim, A. I. J. A. I. R. (2019). Survey on supervised machine learning tech-niques for automatic text classification. ICRAIE, 52(1), 273-292.

Zhou, X., Gururajan, R., Li, Y., Venkata-raman, R., Tao, X., Bargshady, G., Kon-dalsamy Chennakesavan, S. (2020). A survey on text classification and its appli-cations. Web Intelligence, 18(3), 205-216.

Rafae, A., Qayyum, A., M M., Karim, A., Sajjad, H., & Kamiran, F. (2015). An unsupervised method for discovering lexical variations in Roman Urdu informal text. Proceedings of the 2015 Conference on Em-pirical Methods in Natural Language Pro-cessing, 12th September, (pp. 823–828).

Esmaeilzadeh, A., & Taghva, K. (2022). Text classification using neural network language model (nnlm) and bert: An em-pirical comparison. In Intelligent Systems and Applications: Proceedings of the 2021 Intelligent Systems Conference,07 Au-gust, USA, (pp. 175-189). Springer

Ameer, I., Sidorov, G., Gomez-Adorno, H., & Nawab, R. M. A. J. I. A. (2022). Multi-Label Emotion Classification on Code-Mixed Text: Data and Methods. 10(3), 8779-8789

Luo, X. J. A. E. J. (2021). Efficient english text classification using selected machine learning techniques. 60(3), 3401-3409.

Sharf, Z., & Rahman, S. U. (2018). Performing natural language processing on roman urdu datasets. International Journal of Computer Science and Network Securi-ty, 18(1), 141-148.

Qutab, I., Malik, K. I., Arooj (2022). Sentiment Classification Using Multinomi-al Logistic Regression on Roman Urdu Text. 4(2), 223-335.

Tehreem, T. J. a. p. a. (2021). Sentiment analysis for youtube comments in roman urdu.

[10] Ullah, A., Khan, S. N., & Nawi, N. M. (2023). Review on sentiment analysis for text classification techniques from 2010 to 2021. Multimedia Tools and Applica-tions, 82(6), 8137-8193.

Sebai, D., & Shah, A. U. (2023). Se-mantic-oriented learning-based image com-pression by Only-Train-Once quantized au-toencoders. Signal, Image and Video Pro-cessing, 17(1), 285-293.

Mustafa, R., Rai, S., Ullah, U., & Naz, M. S. (2023). Summary in General Sum-mary of an Overview of Opinion Min-ing. Journal of Advancement in Compu-ting, 1(1), 9-13.

Alam, T., Gupta, R., Qamar, S., & shah A. (2022). Recent applications of Artificial Intelligence for Sustainable Development in smart cities. In Recent Innovations in Artifi-cial Intelligence and Smart Applications (pp. 135-154). Cham: Springer International Publishing.

Aznaoui, H., Raghay, S., Ullah, A., & Khan, M. H. (2021). Energy efficient strategy for WSN technology using modi-fied HGAF technique. iJOE, 17(06), 5.

Ouhame, S., & Hadi, Y. (2020). A Hybrid Grey Wolf Optimizer and Artifi-cial Bee Colony Algorithm Used for Im-provement in Resource Allocation System for Cloud Technology. International Jour-nal of Online & Biomedical Engineer-ing, 16(14).

Branch, S. R., & Rey, S. (2018). Providing a load balancing method based on dragonfly optimization algorithm for resource allocation in cloud compu-ting. International Journal of Networked and Distributed Computing, 6(1), 35-42.




How to Cite

Waqas Azeem, & Chakir Aziza. (2024). A HYBRID FEATURE SELECTION APPROACH FOR ROMAN URDU TEXT CLASSIFICATION. Journal of Advancement in Computing, 2(1), 39–44. https://doi.org/10.36755/jac.v2i1.61