Ten rob-CNN: A hybrid feature extraction and fuzzy decision-making approach for Twitter spam detection
- Department of Computer Engineering, Bo.C., Islamic Azad University, Borujerd, Iran
- Department of Mathematics and Computer Science, CT.C., Islamic Azad University, Tehran, Iran
- Department of Computer Engineering, CT.C., Islamic Azad University, Tehran, Iran
- Department of Mathematics and Computer Science, Shahed University, Tehran, Iran
- Institute of Converging Sciences and Technologies, CT. C., Islamic Azad University, Tehran, Iran
Received: 2020-02-28
Revised: 2025-05-15
Accepted: 2025-06-25
Published in Issue 2025-08-23
Copyright (c) 2025 Azam Shekari Shahrak, Nasser Mikaeilvand, Seyed Javad Mirabedini, Seyyed Hamid Haji Seyyed Javadi, Nayereh Zaghari (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
PDF views: 205
Abstract
With the growth of information technology, the use of cyberspace is increasing daily, and individuals are
constantly exchanging information through various platforms. Among them, social networks have gained
increased attention if users spend hours each day on business or sharing information. As social media usage increases, the issue of spam becomes more prevalent. Companies and institutions worldwide spend billions
of dollars combating spam; thus, researchers have focused on the fight against spam. In this context, we
are currently witnessing significant advancements in natural language processing aided by large language
models, which have demonstrated remarkable performance due to their high potential for providing innovative solutions. Thus, this study employs the RoBERTa transformer, which is one of the most advanced transformers in the text classification domain. TensorFlow was used to tune its hyperparameters to enhance the performance of this model. Furthermore, spam filtering is typically performed based on the message content or its non-content features. In this study, features are extracted based on both message aspects. Keyword extraction and vector representation of the text are performed after preprocessing the RoBERTa model. The resulting vector from tweet tokenization is fed into a Convolutional Neural Network for classification. Subsequently, the tweet’s result is combined with non-content features and sent as input to a fuzzy system for classification. A Twitter dataset of approximately 12,000 lines was used for training and testing. The results obtained from the proposed method indicate that fine-tuning the transformer and classification with a neural network can increase the accuracy of spam detection to 99.82%.
References
- [1] S. Sartaj and A. F. Mollah. An intelligent system for spam message detection. Intelligent Systems, pages 387–395, 2021.
- [2] A. Shekari Shahrak, S. J. Mirabedini, N. Mikaeilvand, and S. H. Haj Seyed Javadi. Improving image spam detection using a new image texture features selection. Journal of Modeling in Engineering, 22 (79):211–221, 2024.
- [3] J. Deng. Email spam filtering methods: comparison and analysis. Highlights in Science, Engineering and Technology, 38:187–198, 2023.
- [4] A. Fathima, G. S. Devi, and M. Faizaanuddin. Exploring the potency of machine learning approaches in enhancing spam detection accuracy. 2023.
- [5] J. T. Okpa, B. O. Ajah, O. F. Nzeakor, E. Eshiotse, and T. A. Abang. Business e-mail compromise scam, cyber victimization, and economic sustainability of corporate organizations in nigeria. Security Journal, 36(2):350–372, 2023.
- [6] V. S. Vinitha, D. K. Renuka, and L. A. Kumar. Long short-term memory networks for email spam classification. 2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS), pages 176–180, 2023.
- [7] Y. Al-Hamar, H. Kolivand, and A. Al-Hamar. Phishing attacks in qatar: A literature review of the problems and solutions. 2019 12th International Conference on Developments in eSystems Engineering (DeSE), pages 837–842, 2019.
- [8] M. Kihal and L. Hamza. Robust multimedia spam filtering based on visual, textual, and audio deep features and random forest. Multimedia Tools and Applications, 82(26):40819–40837, 2023.
- [9] J. Yang, T. Li, G. Liang, Y. Wang, T. Gao, and F. Zhu. Spam transaction attack detection model based on gru and wgan-div. Computer Communications, 161:172–182, 2020.
- [10] M. R. Ibraheem. Green computing and security practices for optimizing crawler efficiency. 2023 International Telecommunications Conference (ITC-Egypt), pages 151–156, 2023.
- [11] G. Manita, A. Chhabra, and O. Korbaa. Efficient e-mail spam filtering approach combining logistic regression model and orthogonal atomic orbital search algorithm. Applied Soft Computing, 144:110478, 2023.
- [12] N. Saidani, K. Adi, and M. S. Allili. A semantic-based classification approach for an enhanced spam detection. Computers & Security, 94:101716, 2020.
- [13] U. Srinivasarao and A. Sharaff. Machine intelligence based hybrid classifier for spam detection and sentiment analysis of sms messages. Multimedia Tools and Applications, 82(20):31069–31099, 2023.
- [14] Y. Tian, M. Mirzabagheri, P. Tirandazi, and S. M. H. Bamakan. A non-convex semi-supervised approach to opinion spam detection by ramp-one class svm. Information Processing & Management, 57(6): 102381, 2020.
- [15] Q. Do, M. A. Moriyani, C. Le, and T. Le. Cost-weighted tf-idf: a novel approach for measuring highway project similarity based on pay items’ cost composition and term frequency. Journal of Construction Engineering and Management, 149(8):04023069, 2023.
- [16] D. Romelli, D. Masciandaro, and O. Peia. Central bank communication and social media: From silence to twitter. Central Bank Communication and Social Media: From Silence to Twitter, 2022.
- [17] R. Thapa, B. Lamichhane, D. Ma, and X. Jiao. Spamhd: Memory-efficient text spam detection using brain-inspired hyperdimensional computing. 2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pages 84–89, 2021.
- [18] N. Sun, G. Lin, J. Qiu, and P. Rimba. Near real-time twitter spam detection with machine learning techniques. International Journal of Computers and Applications, 44(4):338–348, 2022.
- [19] T. Xia and X. Chen. A discrete hidden markov model for sms spam detection. Applied Sciences, 10(14):5011, 2020.
- [20] O. Abayomi-Alli, S. Misra, A. Abayomi-Alli, and M. Odusami. A review of soft techniques for sms spam classification: Methods, approaches and applications. Engineering Applications of Artificial Intelligence, 86:197–212, 2019.
- [21] Z. Alom, B. Carminati, and E. Ferrari. A deep learning model for twitter spam detection. Online Social Networks and Media, 18: 100079, 2020.
- [22] S. Noekhah, N. Salim, and N. H. Zakaria. Opinion spam detection: Using multi-iterative graph-based model. Information Processing & Management, 57(1):102140, 2020.
- [23] A. Makkar and N. Kumar. An efficient deep learning-based scheme for web spam detection in iot environment. Future Generation Computer Systems, 108:467–487, 2020.
- [24] Y. Shaalan, X. Zhang, J. Chan, and M. Salehi. Detecting singleton spams in reviews via learning deep anomalous temporal aspect-sentiment patterns. Data Mining and Knowledge Discovery, 35(2): 450–504, 2021.
- [25] A. Shazad, M. N. Chaudhry, M. K. Abid, and N. Aslam. Spam email detection using transfer learning of bert model. Journal of Computing & Biomedical Informatics, 2024.
- [26] S. Jamal, H. Wimmer, and I. H. Sarker. An improved transformer-based model for detecting phishing, spam and ham emails: A large language model approach. Security and Privacy, page e402, 2024.
- [27] R. Cai, B. Qin, Y. Chen, L. Zhang, R. Yang, S. Chen, and W. Wang. Sentiment analysis about investors and consumers in energy market based on bert-bilstm. IEEE Access, 8:171408–171415, 2020.
- [28] N. Zaghari, M. Fathy, S. M. Jameii, M. Sabokrou, and M. Shahverdy. Improving the learning of self-driving vehicles based on real driving behavior using deep neural network techniques. The Journal of Supercomputing, 77:3752–3794, 2021.
- [29] D. H. Hagos, R. Battle, and D. B. Rawat. Recent advances in generative ai and large language models: Current status, challenges, and perspectives. IEEE Transactions on Artificial Intelligence, pages 211–222, 2024.
- [30] M. Varaliya, M. Kanojia, S. Nabajja, R. Kozma, and B. Vo. Optimizing automated conversational large language models for higher educational institution. International Journal of Computer Information Systems and Industrial Management Applications, 16(3):17–17, 2024.
- [31] X. Sun and W. Lu. Understanding attention for text classification. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3418–3428, 2020.
- [32] Z. Li, X. Yang, L. Zhou, H. Jia, and W. Li. Text matching in insurance question-answering community based on an integrated bilstm-textcnn model fusing multi-feature. Entropy, 25(4):639–645, 2023.
- [33] S. B. Abkenar, M. H. Kashani, M. Akbari, and E. Mahdipour. Learning textual features for twitter spam detection: A systematic literature review. Expert Systems with Applications, 228:120366, 2023.
- [34] S. Alshammari, E. Aljabarti, and Y. Yusoff. Protection of users kids on twitter platform using naıve bayes. Kids Cybersecurity Using Computational Intelligence Techniques, pages 109–120, 2023.
- [35] M. Umer, E. A. Alabdulqader, A. A. Alarfaj, L. Cascone, and M. Nappi. Cyberbullying detection using pca extracted glove features and robertanet transformer learning model. IEEE Transactions on Computational Social Systems, pages 190–201, 2024.
- [36] T. Xu. Enhancing cyber security: Comparing the accuracy of the bert model with other common deep learning models in identifying email spam. Advances in Engineering and Intelligence Systems, 4 (01):84–101, 2025.
10.57647/j.fomj.2025.0602.09
