Dimension Reduction of Big Data and Deleting Noise and Its Efficiency in the Decision Tree Method and Its Use in Covid 19

Fazel Badakhshan Farahabadi; Kianoush Fathi Vajargah; Rahman Farnoosh

doi:10.30495/ijm2c.2022.1947200.1239

10.30495/ijm2c.2022.1947200.1239

Dimension Reduction of Big Data and Deleting Noise and Its Efficiency in the Decision Tree Method and Its Use in Covid 19

PDF

Fazel Badakhshan Farahabadi¹,
Kianoush Fathi Vajargah*²,
Rahman Farnoosh³,

Department of Statistics, Islamic Azad University, Science and Research Branch, Tehran, Iran
Department of Statistics, Islamic Azad University, Tehran North Branch
School of Mathematics, Iran University of Science and Technology, Tehran, 16844, Iran

Received: 13-12-2021

Accepted: 10-02-2022

Published in Issue 01-09-2022

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Badakhshan Farahabadi, F., Fathi Vajargah, K., & Farnoosh, R. (2022). Dimension Reduction of Big Data and Deleting Noise and Its Efficiency in the Decision Tree Method and Its Use in Covid 19. International Journal of Mathematical Modelling & Computations, 12(3), 183-190. https://doi.org/10.30495/ijm2c.2022.1947200.1239

Abstract

In today's world, with the advancement of science and technology, data is generated at high speeds, and with the increase in the size and volume of data, we often face a lot of extensions and redundant data and noise data that make the task of analysis difficult. Therefore, dimension reduction of the data without losing useful information in the data is very important to prepare the data for data mining and can increase the speed and even accuracy of the analysis. In this research, we present a dimensional reduction method using a copula function that reduces the dimensions of the data by identifying the relationships between the data. The copula function provides a good pattern of dependence for comparing multivariate distributions to better identify the relationship between data. In fact, by fitting the appropriate copula function to the data and estimating the copula function parameter, we measure the structural correlation of the variables and eliminate variables that are highly structurally correlated with each other. As a result, in the method presented in this study, using the copula function, we identify noise data and data with many common features and remove them from the original data.

PDF

References

F. Badakhshan Farahabadi, K. F. Vajargah and R. Farnoosh, Dimension reduction big data using
recognition of data features based on copula function and principal component analysis, Advances
in Mathematical Physics, 2021 (2021), Article ID 9967368, doi:10.1155/2021/9967368.
B. Charbuty and A. Abdulazeez, Classification based on decision tree algorithm for machine learning,
Journal of Applied Science and Technology Trends, 2 (1) (2021) 20–28.
F. Durante, J. Fernandez-Sanchez and C. Sempi, A topological proof of sklars theorem, Applied
Mathematics Letters, 26 (9) (2013) 945–948.
M. Haugh, An introduction to copulas, IEOR E4602: quantitative risk management, Lecture notes,
Columbia University, (2016).
F. B. Farahabadi et al./ IJM2C, 12 - 03 (2022) 183-190.
R. Houari, A. Bounceur, M.-T. Kechadi, A.-K. Tari and R. Euler, Dimensionality reduction in data
mining: A copula approach, Expert Systems with Applications, 64 (2016) 247–260.
A. Gajewicz et al., Decision tree models to classify nanomaterials according to the DF4nanogrouping
scheme, Nanotoxicology, 12 (1) (2018) 1–17.
F. Gorunescu, Data Mining: Concepts, Models and Techniques, Springer Science & Business Media,
(2011).
D. Lopez-Paz, J. M. Hern´andez-Lobato and G. Zoubin, Gaussian process vine copulas for multivariate
dependence, in International Conference on Machine Learning, PMLR, (2013) 10–18.
D. MacKenzie and T. Spears, The formula that killed wall street: The Gaussian copula and modelling
practices in investment banking, Social Studies of Science, 44 (3) (2014) 393–417.
C. E. Metz, Basic principles of roc analysis, in Seminars in Nuclear Medicine, 8 (1978) 283–298,
Elsevier.
R. B. Nelsen, An Introduction to Copulas, Springer Science & Business Media, (2007).
K. Nigam, J. Lafferty and A. McCallum, Using maximum entropy for text classification, in IJCAI-99
Workshop on Machine Learning for Information Filtering, 1 (1) (1999) 61–67.
H. H. Patel and P. Prajapati, Study and analysis of decision tree based classification algorithms,International Journal of Computer Sciences and Engineering, 6 (10) (2018) 74–78.
J. Tanha, M. van Someren and H. Afsarmanesh, Semi-supervised self-training for decision tree classifiers, International Journal of Machine Learning and Cybernetics, 8 (1) (2017) 355–370.
E. W. Weisstein et al., Mathworld–a wolfram web resource, (2004)

Dimension Reduction of Big Data and Deleting Noise and Its Efficiency in the Decision Tree Method and Its Use in Covid 19

How to Cite

Download Citation

Abstract

References

Most read articles by the same author(s)