• Waqas Azeem Faculty of Computer science University: Government college university Faisalabad
  • Chakir Aziza Faculty of Law, Economics and Social Sciences, Hassan II University, Casablanca, Maroc




Hybrid Roman Urdu, Data Mining, Feature Extractions, Text Classification, Sentimental Analysis


Text classification is the task of assigning labels to unlabeled text data. Text classification has several applications like sentiment analysis, document classification, and fake news detection such as Machine learning (ML) methods have been used commonly in text classification in the last several years. The fundamental problem in ML is that these approaches heavily depend on feature selection methods. The models and feature selection methods used in this research. Several past types of research conclude that there is no uniform feature selection method that works well for all types of classifier tasks as well as Urdu is a resource-poor language. In this study, a proposed hybrid feature selection approach for Roman Urdu text not only reduces the dimension of the feature map but also increases the accuracy of ML models. Using 11000 and 20000 records have been used for Support Vector Classifier, Naive Base and Decision Tree which have given 80.81%, 72.94% and 76.78% respectively, among other tested methods. The best accuracy values achieved by each classifier and the hybrid features ChiSAE, CorrelationAE, and GainRAE. In future, text classification for better understanding of human being self-analysis as well as deep learning methods will be utilized for better authenticity.


Waqas Azeem, & Chakir Aziza. (2024). A HYBRID FEATURE SELECTION APPROACH FOR ROMAN URDU TEXT CLASSIFICATION. Journal of Advancement in Computing, 2(1), 39–44. https://doi.org/10.36755/jac.v2i1.61