Spam detection through feature selection using artificial neural network and sine–cosine algorithm
- Department of Engineering, Malard Branch, Islamic Azad University, Tehran, IR
Published in Issue 2020-04-30
How to Cite
Talaei Pashiri, R., Rostami, Y., & Mahrami, M. (2020). Spam detection through feature selection using artificial neural network and sine–cosine algorithm. Mathematical Sciences, 14(3 (September 2020). https://doi.org/10.1007/s40096-020-00327-8
Abstract
Abstract Detection of spam and non-spam emails is considered a great challenge for email service providers and users alike. Spam emails waste the Internet traffic and also contain malicious links that mostly direct users to phishing webpages. Another challenge of spams is their role in spreading malware on the network, further emphasizing the need for their detection. Despite the application of data mining methods such as artificial neural networks (ANNs) in spam detection, these methods are prone to a significant error in their output mostly due to including all the spam features in their training stage. To reduce the spam detection error, a feature selection-based method was provided in this paper using the sine–cosine algorithm (SCA). In the proposed method, feature vectors are updated by the SCA to select the optimal features for training the ANN. Implementation of the proposed method of the Spambase dataset in MATLAB indicated a precision, accuracy and sensitivity of 98.64%, 97.92% and 98.36%, respectively. In other words, the proposed method outperformed the multilayer perceptron (MLP) neural network, Bayesian network, decision tree and random forest classifiers in terms of spam detection. According to the test results, the feature selection error in the MLP neural network decreased by approximately 2.18% using the SCA.Keywords
- Email spam,
- Sine–cosine algorithm,
- Metaheuristic algorithms,
- Data mining
References
- Ferrara, E.: The history of digital spam. arXiv preprint
- arXiv:1908.06173
- (2019)
- Ren and Ji (2019) Learning to detect deceptive opinion spam: a survey (pp. 42934-42945) https://doi.org/10.1109/ACCESS.2019.2908495
- Broadhurst, R., Trivedi, H.: Malware in spam email: trends in the 2016 Australian Spam Intelligence Data. Available at SSRN 3413442 (2018)
- Kumar, V., Kumar, P., Sharma, A.: Spam email detection using ID3 algorithm and hidden Markov model. In: 2018 Conference on Information and Communication Technology (CICT), pp. 1–6, IEEE (2018)
- Fang et al. (2019) Phishing email detection using improved RCNN model with multilevel vectors and attention mechanism (pp. 56329-56340) https://doi.org/10.1109/ACCESS.2019.2913705
- Ji et al. (2018) Correction to: a whitelist and blacklist-based co-evolutionary strategy for defensing against multifarious trust attacks 48(7) https://doi.org/10.1007/s10489-018-1195-1
- Caraffini et al. (2019) HyperSpam: a study on hyper-heuristic coordination strategies in the continuous domain (pp. 189-202) https://doi.org/10.1016/j.ins.2018.10.033
- Sharaff. A., Gupta, H.: Extra-tree classifier with metaheuristic approach for email classification. In: Advances in Computer Communication and Computational Sciences, pp. 189–197. Springer, Singapore (2019)
- Salihovic, I., Serdarevic, H., Kervic, J.: The role of feature selection in machine learning for detection of spam and phishing attacks. In: International Symposium on Innovative and Interdisciplinary Applications of Advanced Technologies, pp. 476–483. Springer, Cham (2018)
- Alghoul et al. (2018) Email classification using artificial neural network 2(11) (pp. 8-14)
- Yu (2015) Covert communication by means of email spam: a challenge for digital investigation (pp. 72-79) https://doi.org/10.1016/j.diin.2015.04.003
- Aleroud and Zhou (2017) Phishing environments, techniques, and countermeasures: a survey (pp. 160-196) https://doi.org/10.1016/j.cose.2017.04.006
- Fang et al. (2019) Phishing email detection using improved RCNN model with multilevel vectors and attention mechanism (pp. 374-406)
- Gupta and Deep (2019) Improved sine cosine algorithm with crossover scheme for global optimization (pp. 374-406) https://doi.org/10.1016/j.knosys.2018.12.008
- Venkatraman et al. (2020) Spam e-mail classification for the Internet of Things environment using semantic similarity approach (pp. 756-776) https://doi.org/10.1007/s11227-019-02913-7
- Asghar et al. (2020) Opinion spam detection framework using hybrid classification scheme (pp. 3475-3498) https://doi.org/10.1007/s00500-019-04107-y
- Citlak et al. (2019) A survey on detecting spam accounts on Twitter network 9(1)
- Shuaib et al. (2019) Whale optimization algorithm-based email spam feature selection method using rotation forest algorithm for classification 1(5) https://doi.org/10.1007/s42452-019-0394-7
- Mokri et al. (2019) New bio-inspired technique based on octopus algorithm for spam filtering (pp. 3425-3435) https://doi.org/10.1007/s10489-019-01463-y
- Chikh and Chikhi (2019) Clustered negative selection algorithm and fruit fly algorithm based email spam classification 10(1) (pp. 143-152) https://doi.org/10.1007/s12652-017-0621-2
- Kumaresan et al. (2019) Visual and textual features based email spam classification using S-Cuckoo search and hybrid kernel support vector machine 22(1) (pp. 33-46) https://doi.org/10.1007/s10586-017-1615-8
- Shuaib et al. (2018) Comparative analysis of classification algorithms for email spam detection 10(1) https://doi.org/10.5815/ijcnis.2018.01.07
10.1007/s40096-020-00327-8