Deep Learning Based Cancer Classification Using Selected Gene Expression Data
- Department of Biomedical Engineering, ST.C., Islamic Azad University, Tehran, Iran
- Department of Electrical Engineering, ST.C., Islamic Azad University, Tehran, Iran
- Department of Biomedical Engineering, SR.C., Islamic Azad University, Tehran, Iran
Received: 2025-06-24
Revised: 2025-07-19
Accepted: 2025-08-04
Published in Issue 2025-09-30
Copyright (c) 2025 Setare Tabasi, Iman Ahanian, Nasrin Amiri, Nader Jafarnia Dabanloo, Hamide Barghamadi (Author)

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
How to Cite
PDF views: 41
Abstract
Cancer incidence has yearly surged by 23.6 million in 2019, with 10 million deaths from cancer reported in the same year, signifying a 26% and 21% upsurge in cases and deaths over the last decade. Gene expression (GE) analysis is a robust technique for the early diagnosis and classification of cancers by identifying unique molecular characteristics in diverse organs. Unlike other methods, identifying cancer-related genes from GE profiles results in more effective and tailored therapies. This technique, however, suffers from shortcomings such as huge amounts of GE data available, making data extraction and analysis a challenging task, a dearth of inclusive databases, and poor access to GE data extraction technology in some regions. This research proposes a deep learning (DL)-based minimum-redundancy-maximum-relevance (mRMR) technique for the precise identification of six cancer types using GE data, where DL-mRMR merges DL and feature selection (FS) to pick effective features and reduce dataset dimensions. Entrenched in FS, DL-mRMR downsizes the volume of input features reflecting similarities between cancers and within samples of a given tumor, and eventually balances the samples in the database, thereby diagnosing cancer types using a DL classifier. TCGA database simulations reveal a 99% accuracy (ACC) of DL-mRMR in differentiating breast, colorectal, kidney, and lung cancers based on GE data. In addition, due to the reduction in the size of the input sample vector, the complexity of the classifier algorithm and the number of samples required for its training are lower than similar approaches.
Keywords
- Gene expression,
- Cancer classification,
- Deep learning,
- Feature reduction,
- mRMR
References
- Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, Jemal A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians. 2024 May;74(3):229-63. https://10.3322/caac.21834
- Seyedkanani E, Hosseinzadeh M, Mirghafourvand M, Sheikhnezhad L. Breast cancer screening patterns and associated factors in Iranian women over 40 years. Scientific Reports. 2024 Jul 3;14(1):15274. https://doi.org/10.1038/s41598-024-66342-0
- Pramesh, C. S., Rajendra A. Badwe, Nirmala Bhoo-Pathy, Christopher M. Booth, Girish Chinnaswamy, Anna J. Dare, Victor Piana de Andrade, V.P., Hunter, D.J., Gopal, S., Gospodarowicz, M. and Gunasekera, S. "Priorities for cancer research in low-and middle-income countries: a global perspective." Nature medicine 28, no. 4 (2022): 649-657. https://doi.org/10.1038/s41591-022-01738-x
- Manikandan G, Abirami S. Feature selection is important: state-of-the-art methods and application domains of feature selection on high-dimensional data. Applications in Ubiquitous Computing. 2021:177-96. https://doi.org/10.1007/978-3-030-35280-6_9
- Alharbi, Fadi, and Aleksandar Vakanski. "Machine learning methods for cancer classification using gene expression data: A review." Bioengineering 10, no. 2 (2023): 173. https://doi.org/10.3390/bioengineering10020173
- Liu, Hengrui, Zheng Guo, and Panpan Wang. "Genetic expression in cancer research: Challenges and complexity." Gene Reports (2024): 102042. https://10.1016/j.genrep.2024.102042
- Alhenawi, Esra'A., Rizik Al-Sayyed, Amjad Hudaib, and Seyedali Mirjalili. "Feature selection methods on gene expression microarray data for cancer classification: A systematic review." omputers in biology and medicine 140 (2022): 105051. https://doi.org/10.1016/j.compbiomed.2021.105051
- Kang, Chuanze, Yanhao Huo, Lihui Xin, Baoguang Tian, and Bin Yu. "Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine." Journal of theoretical biology 463 (2019): 77-91. https://10.1016/j.jtbi.2018.12.010
- Zhu, Haiqing, Ning Bi, Jun Tan, and Dongjie Fan. "An embedded method for feature selection using kernel parameter descent support vector machine." In Pattern Recognition and Computer Vision: First Chinese Conference, PRCV 2018, Guangzhou, China, November 23-26, 2018, Proceedings, Part III 1, pp. 351-362. Springer International Publishing, 2018. https://doi.org/10.1007/9
- Mishra, Shruti, and Debahuti Mishra. "SVM-BT-RFE: An improved gene selection framework using Bayesian T-test embedded in support vector machine (recursive feature elimination) algorithm." Karbala International Journal of Modern Science 1, no. 2 (2015): 86-96. https://10.1016/j.kijoms.2015.10.002
- Zhang, Li, and Xiaojuan Huang. "Multiple SVM-RFE for multi-class gene selection on DNA microarray data." In 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1-6. IEEE, 2015. https://10.1109/IJCNN.2015.7280417
- Naik, Akshata K., and Venkatanareshbabu Kuppili. "An embedded feature selection method based on generalized classifier neural network for cancer classification." Computers in Biology and Medicine 168 (2024): 107677.
- Mahendran, Nivedhitha, and Durai Raj Vincent PM. "A deep learning framework with an embedded-based feature selection approach for the early detection of the Alzheimer's disease." Computers in Biology and Medicine 141 (2022): 105056. https://10.1016/j.compbiomed.2021.105056
- Bommert, Andrea, Thomas Welchowski, Matthias Schmid, and Jörg Rahnenführer. "Benchmark of filter methods for feature selection in high-dimensional gene expression survival data." Briefings in Bioinformatics 23, no. 1 (2022): bbab354. https://doi.org/10.1093/bib/bbab354
- Kundu, Rohit, Soham Chattopadhyay, Erik Cuevas, and Ram Sarkar. "AltWOA: Altruistic Whale Optimization Algorithm for feature selection on microarray datasets." Computers in biology and medicine 144 (2022): 105349. https://10.1016/j.compbiomed.2022.105349
- Guo, Yinan, Zirui Zhang, and Fengzhen Tang. "Feature selection with kernelized multi-class support vector machine." Pattern Recognition 117 (2021): 107988. https://10.1016/j.patcog.2021.107988
- Jain, V.K. Jain, R. Jain, An improved binary particle swarm optimization (ibpso) for gene selection and cancer classification using dna microarrays, in: 2018 Conference on Information and Communication Technology (CICT), IEEE, 2018, pp. 1–6. https://10.1109/INFOCOMTECH.2018.8722351
- Moradi, M. Gholampour, A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy, Appl. Soft Comput. 43 (2016) 117–130.
- Garibay, G. Sanchez-Ante, L.E. Falcon-Morales, H. Sossa, Modified binary inertial particle swarm optimization for gene selection in dna microarray data, in: Mexican Conference on Pattern Recognition, Springer, 2015, pp. 271–281. https://doi.org/10.1007/978-3-319-19264-2_26
- Mohapatra and S. Chakravarty, "Modified PSO based feature selection for Microarray data classification," 2015 IEEE Power, Communication and Information Technology Conference (PCITC), Bhubaneswar, India, 2015, pp. 703-709, doi: https://10.1109/PCITC.2015.7438088.
- K.-H. Chen, K.-J. Wang, K.-M. Wang, M.-A. Angelia, Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data, Appl. Soft Comput. 24 (2014) 773–780. https://:10.1016/j.asoc.2014.08.032
- Almugren, H. Alshamlan, Ff-svm: new firefly-based gene selection algorithm for microarray cancer classification, in: 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), IEEE, 2019, pp. 1–6. https://10.1109/CIBCB.2019.8791236
- Jinthanasatian, S. Auephanwiriyakul, N. Theera-Umpon, Microarray data classification using neuro-fuzzy classifier with firefly algorithm, in: 2017 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE, 2017, pp. 1–6
- https://10.1109/SSCI.2017.8280967
- Ragunthar, S. Selvakumar, A wrapper based feature selection in bone marrow plasma cell gene expression data, Cluster Comput. 22 (6) (2019) 13785–13796.
- https://doi.org/10.1007/s10586-018-2094-2
- M.S. Pratiwi, A. Aditsania., Cancer detection based on microarray data classification using genetic bee colony (gbc) and conjugate gradient backpropagation with modified polak ribiere (mbp-cgp), in: 2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA), IEEE, 2018, pp. 163–168. https://10.1109/IC3INA.2018.8629538
- M.A. Tawhid, A.M. Ibrahim, Feature selection based on rough set approach, wrapper approach, and binary whale optimization algorithm, Int. J. Machine Learn. Cybernet. 11 (3) (2020) 573–602. https://doi.org/10.1007/s13042-019-00996-5
- Zakeri, A. Hokmabadi, Efficient feature selection method using real-valued grasshopper optimization algorithm, Expert Syst. Appl. 119 (2019) 61–72. https://10.1016/j.eswa.2018.10.021
- Chatra, V. Kuppili, D.R. Edla, A.K. Verma, Cancer data classification using binary bat optimization and extreme learning machine with a novel fitness function, Med. Biol. Eng. Comput. 57 (12) (2019) 2673–2682. https://doi.org/10.1007/s11517-019-02043-5
- Ghosh, S. Begum, R. Sarkar, D. Chakraborty, U. Maulik, Recursive memetic algorithm for gene selection in microarray data, Expert Syst. Appl. 116 (2019) 172–185. https://10.1016/j.eswa.2018.06.057
- Allam, M. Nandhini, Optimal Feature Selection Using Binary Teaching Learning Based Optimization Algorithm, Journal of King Saud UniversityComputer and Information Sciences, 2018. https://10.1016/j.jksuci.2018.12.001
- Sharma, K.K. Paliwal, S. Imoto, S. Miyano, A feature selection method using improved regularized linear discriminant analysis, Mach. Vis. Appl. 25 (3) (2014) 775–786. https://doi.org/10.1007/s00138-013-0577-y
- Roozbahani, Zahra, M. Yari, and Razieh Ghiasi. "Developing a Filter-Wrapper Feature Selection Method and its Application in Dimension Reduction of Gen Expression." Soft Computing Journal 6, no. 2 (2021): 48-59.
- Ke, Lin, Min Li, Lei Wang, Shaobo Deng, Jun Ye, and Xiang Yu. "Improved swarm-optimization-based filter-wrapper gene selection from microarray data for gene expression tumor classification." Pattern Analysis and Applications 26, no. 2 (2023): 455-472. https://doi.org/10.1007/s10044-022-01117-9
- Jain, Rahi, and Wei Xu. "Artificial Intelligence based wrapper for high dimensional feature selection." BMC bioinformatics 24, no. 1 (2023): 392.
- https://doi.org/10.1007/s00138-013-0577-y
- Song, Yu-Wei, Jie-Sheng Wang, Yu-Liang Qi, Yu-Cai Wang, Hao-Ming Song, and Yi-Peng Shang-Guan. "Serial filter-wrapper feature selection method with elite guided mutation strategy on cancer gene expression data." Artificial Intelligence Review 58, no. 4 (2025): 1-49. https://doi.org/10.1007/s10462-024-11029-1
- Li, Zong-Zheng, Fang-Ling Wang, Feng Qin, Yusliza Binti Yusoff, and Azlan Mohd Zain. "Feature selection of gene expression data using a modified artificial fish swarm algorithm with population variation." IEEE Access 12 (2024): 72688-72706. https://10.1109/ACCESS.2024.3402652
- F.V. Sharbaf, S. Mosafer, M.H. Moattar, A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization, Genomics 107 (6) (2016) 231–238. https://10.1016/j.ygeno.2016.05.001
- Jain, V.K. Jain, R. Jain, Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification, Appl. Soft Comput. 62 (2018) 203–215. https://10.1016/j.asoc.2017.09.038
- Chinnaswamy, R. Srinivasan, Hybrid feature selection using correlation coefficient and particle swarm optimization on microarray gene expression data, in: Innovations in Bio-Inspired Computing and Applications, Springer, 2016, pp. 229–239. https://doi.org/10.1007/978-3-319-28031-8_20
- Pashaei, M. Ozen, N. Aydin, Gene selection and classification approach for microarray data based on random forest ranking and bbha, in: 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), IEEE, 2016, pp. 308–311. https://10.1109/BHI.2016.7455896
- Dabba, A. Tari, S. Meftali, Hybridization of moth flame optimization algorithm and quantum computing for gene selection in microarray data, J. Ambi. Intell. Humanized Comput. (2020) 1–20. https://doi.org/10.1007/s12652-020-02434-9
- S.K. Baliarsingh, S. Vipsita, B. Dash, A new optimal gene selection approach for cancer classification using enhanced jaya-based forest optimization algorithm, Neural Comput. Appl. 32 (12) (2020) 8599–8616. https://doi.org/10.1007/s00521-019-04355-x
- Alanni, J. Hou, H. Azzawi, Y. Xiang, New gene selection method using gene expression programing approach on microarray data sets, in: International Conference on Computer and Information Science, Springer, 2018, pp. 17–31.
- https://doi.org/10.1007/978-3-319-98693-7_2
- Xu, Zhaozhao, Fangyuan Yang, Chaosheng Tang, Hong Wang, Shuihua Wang, Junding Sun, and Yudong Zhang. "FG-HFS: A feature filter and group evolution hybrid feature selection algorithm for high-dimensional gene expression data." Expert Systems with Applications 245 (2024): 123069.
- https://doi.org/10.1016/j.eswa.2023.123069
- Song, Yu-Wei, Jie-Sheng Wang, Yu-Liang Qi, Yu-Cai Wang, Hao-Ming Song, and Yi-Peng Shang-Guan. "Serial filter-wrapper feature selection method with elite guided mutation strategy on cancer gene expression data." Artificial Intelligence Review 58, no. 4 (2025): 1-49. https://doi.org/10.1007/s10462-024-11029-1.
- Yaqoob, Abrar, Navneet Kumar Verma, Rabia Musheer Aziz, and Mohd Asif Shah. "Optimizing cancer classification: a hybrid RDO-XGBoost approach for feature selection and predictive insights." Cancer Immunology, Immunotherapy 73, no. 12 (2024): 261. https://doi.org/10.1007/s00262-024-03843-x
- Gong, Huanhuan, Yanying Li, Jiaoni Zhang, Baoshuang Zhang, and Xialin Wang. "A new filter feature selection algorithm for classification task by ensembling pearson correlation coefficient and mutual information." Engineering Applications of Artificial Intelligence 131 (2024): 107865. https://doi.org/10.1145/3712199
- Gulande, Punam, and Raval Awale. "A Hybrid mRMR-RSA Feature Selection Approach for Lung Cancer Diagnosis Using Gene Expression Data." Biomedical and Pharmacology Journal 18, no. March Spl Edition (2025): 257-270. https://dx.doi.org/10.13005/bpj/3086
- Yaqoob, Abrar. "Combining the mRMR technique with the Northern Goshawk Algorithm (NGHA) to choose genes for cancer classification." International Journal of Information Technology (2024): 1-12. https://doi.org/10.1007/s41870-024-01849-3
- Khaing, May Myat Myat, May Mar Oo, and Htoo Naing Aung. "Cancer Type Detection based on Gene Expression Data using Support Vector Machine." In 2024 Conference of Young Researchers in Electrical and Electronic Engineering (ElCon), pp. 230-234. IEEE, 2024. https://10.1109/ElCon61730.2024.10468149
- Peng, Hanchuan, Fuhui Long, and Chris Ding. "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy." IEEE Transactions on pattern analysis and machine intelligence 27, no. 8 (2005): 1226-1238. https://doi.org/10.1109/TPAMI.2005.159.
- Ding, Chris, and Hanchuan Peng. "Minimum redundancy feature selection from microarray gene expression data." Journal of bioinformatics and computational biology 3, no. 02 (2005): 185-205. https://doi.org/10.1142/S0219720005001004
- Ramírez‐Gallego, Sergio, Iago Lastra, David Martínez‐Rego, Verónica Bolón‐Canedo, José Manuel Benítez, Francisco Herrera, and Amparo Alonso‐Betanzos. "Fast‐mRMR: Fast minimum redundancy maximum relevance algorithm for high‐dimensional big data." International Journal of Intelligent Systems 32, no. 2 (2017): 134-152. https://doi.org/10.1002/int.21833
- Mohammed, Mohanad, Henry Mwambi, Innocent B. Mboya, Murtada K. Elbashir, and Bernard Omolo. "A stacking ensemble deep learning approach to cancer type classification based on TCGA data." Scientific reports 11, no. 1 (2021): 15626. https://doi.org/10.1038/s41598-021-95128-x
- Divate, Mayur, Aayush Tyagi, Derek J. Richard, Prathosh A. Prasad, Harsha Gowda, and Shivashankar H. Nagaraj. "Deep learning-based pan-cancer classification model reveals tissue-of-origin specific gene expression signatures." Cancers 14, no. 5 (2022): 1185. https://doi.org/10.3390/cancers14051185
- Ramirez, Ricardo, Yu-Chiao Chiu, Allen Hererra, Milad Mostavi, Joshua Ramirez, Yidong Chen, Yufei Huang, and Yu-Fang Jin. "Classification of cancer types using graph convolutional neural networks." Frontiers in physics 8 (2020): 203. https://doi.org/10.3389/fphy.2020.00203
10.57647/j.spre.2025.090318