The problem of multicollinearity among predictor variables is a frequent issue in longitudinal data analysis. In this context, this paper proposes a mixed ridge regression model via shrinkage methods to analyze such data. Furthermore, in view of obtaining more efficient estimators, we propose preliminary and Stein-type estimators using prior information for fixed-effects parameters. The model parameters are estimated via the EM algorithm. A simulation study is also presented to assess the performance of the estimators under different estimation methods. An application to the HIV data is also illustrated.
In longitudinal data setup, repeated measures of some variables of interest are collected over a specified time period for different independent subjects or individuals. Such types of data are commonly encountered in medical research where the responses are subject to various time-dependent and time-constant effects such as pre- and post-treatment types, gender effect, baseline measures and among others (see Mamode Khan et al. [ 1 ], Yuan et al. [ 2 ], Verbeke et al. [ 3 ], Temesgen and Kebede [ 4 ], Seyoum et al. [ 5 ] and the references therein). It is quite natural, in the above examples, the repeated measures shall exhibit some forms of dependence that may be resulted from some serial or random effects as outlined by Zeger and Liang [ 6 ], Thall and Vail [ 7 ], Laird and Ware [ 8 ], Sutradhar [ 9 ] and Sutradhar and Jowaheer [ 10 ]. Thus, the main purpose of the longitudinal studies is to estimate the effects of the various parameters and determine their significance while the dependence estimate is treated as secondary. In this context, FitzMaurice and Laird [ 11 ] and Sutradhar et al. [ 12 ] have proposed various likelihood-based and pseudo-likelihood-based estimation procedures to estimate the regression effects but the efficiency of the estimators in these approaches may be questionable, in particular, under multi-collinearity among the predictor variables as considered by Eliot et al. [ 13 ], Hossain et al. [ 14 ] and Saleh et al. [ 15 ].
Since longitudinal data mostly arise from clinical studies, the expert knowledge about the parameters has vital impact on the output and is thus an important component in the estimation of model parameters. The preliminary test and shrinkage techniques are mostly used mechanisms in which a prior knowledge can be included in the estimation stage (see papers by Ali and Saleh [ 16 ], Ahmed and Fallahpour [ 17 ] ,Roozbeh and Arashi [ 18 ] and Yuzbasi and Ahmed [ 19 ], Yuzbasi et al. [ 20 ] and Asar [ 21 ] and the references therein). In this paper, we develop the preliminary test and shrinkage estimation methods for the analysis of longitudinal data in ridge regression context, where some parameters are subject to certain/uncertain restrictions. By this, we improve the estimation technique, in both the mean squared error (MSE) and mean prediction error (MPE) senses.
We begin with the linear mixed effects (LM) model given by
The log-likelihood function of
Consider a situation in which the multi-collinearity problem is present. The ridge regression approach designed specifically to handle correlated predictors involves introducing a shrinkage penalty
k
to the least squares equation, and subsequently solving for the value of
In this section, we consider the mixed ridge (MR) model (
2
) and develop preliminary test and Stein-type estimation of the fixed-effects parameter
In the statistical literature, preliminary test estimation of parameters was introduced by Bancroft [ 22 ] to estimate the parameters of a model when it is suspected that some “uncertain prior information” (UPI) on the parameter of interest is available. The method involves a statistical test of the UPI based on an appropriate statistic and a decision on whether the model-based sample estimate or the prior information-based estimate of the model parameters should be taken.
In our case, if we suspect
As a result of this test, we choose
The PTMR estimator is highly dependent on the level of significance
Stein-type estimation was introduced by Stein [
23
] and James and Stein [
24
] in the statistical literature. It combines UPI on the parameters of interest and the sample observation from the statistical model. In the context of LM model, using the same approach as in Saleh [
15
], the Stein-type estimator of
Notice that the forms of
Consider the setting in which the variance parameters
Further, for the ease of computations, we use the estimate of Hoerl and Kenard [
25
] for the ridge parameter as the initial value. It is given by

A Monte Carlo simulation study is conducted to evaluate the performance of the proposed PTMR and shrinkage estimators compared to the MR estimator of Eliot et al. [
13
]. In our simulation scheme, we fix
Stein-type estimation for mixed and MR MR Shrinkage Shrinkage MR Estimate sd Estimate sd Estimate sd Estimate sd 0.0 0.0 − 0.01651 0.01897 0.03828 0.01708 − 0.01585 0.01833 0.03707 0.01651 0.1 0.09016 0.01864 0.11907 0.01593 0.08725 0.01802 0.11513 0.01530 0.2 0.19841 0.01832 0.21519 0.01559 0.19189 0.01777 0.20805 0.01498 0.4 0.38544 0.02153 0.37015 0.01870 0.37298 0.02110 0.35803 0.01861 0.8 0.83942 0.01785 0.75063 0.01868 0.81200 0.01640 0.72591 0.01979 MSE 0.294153 0.243336 0.271858 0.243966 0.2 0.0 − 0.00962 0.02271 0.06112 0.02014 − 0.00914 0.02193 0.05919 0.01947 0.1 0.09251 0.02057 0.12655 0.01728 0.08954 0.01989 0.12235 0.01659 0.2 0.19622 0.02042 0.21971 0.01648 0.18980 0.01980 0.21244 0.01581 0.4 0.37928 0.02446 0.35959 0.02064 0.36707 0.02396 0.34783 0.02057 0.8 0.83806 0.01875 0.72513 0.02178 0.81081 0.01734 0.70131 0.02312 MSE 0.370953 0.312257 0.344432 0.307289 0.5 0.0 − 0.00016 0.02968 0.07565 0.02530 0.00004 0.02866 0.07331 0.02445 0.1 0.09685 0.02508 0.14022 0.02006 0.09381 0.02424 0.13562 0.01926 0.2 0.19709 0.02523 0.22438 0.01961 0.19065 0.02445 0.21697 0.01883 0.4 0.35966 0.03151 0.34586 0.02530 0.34827 0.03089 0.33473 0.02511 0.8 0.84249 0.02363 0.70636 0.02399 0.81521 0.02205 0.68334 0.02541 MSE 0.595086 0.442459 0.55458 0.440721 0.7 0.0 0.01952 0.03965 0.11723 0.03073 0.01919 0.03831 0.11360 0.02973 0.1 0.09350 0.03343 0.16157 0.02493 0.09062 0.03233 0.15625 0.02394 0.2 0.18086 0.03260 0.22593 0.02314 0.17494 0.03164 0.21845 0.02226 0.4 0.35057 0.03940 0.32776 0.02927 0.33946 0.03852 0.31720 0.02904 0.8 0.85111 0.03275 0.65915 0.03311 0.82374 0.03098 0.63787 0.03439 MSE 1.026224 0.691952 0.958517 0.686111 0.9 0.0 0.06336 0.06138 0.21356 0.03809 0.06172 0.05929 0.21040 0.03704 0.1 0.05429 0.05000 0.21684 0.02302 0.05289 0.04841 0.21351 0.02192 0.2 0.15672 0.05137 0.20109 0.02179 0.15134 0.04986 0.19811 0.02068 0.4 0.31727 0.05493 0.29482 0.02799 0.30729 0.05360 0.28893 0.02728 0.8 0.90350 0.05326 0.52754 0.05286 0.87430 0.05059 0.51431 0.05294 MSE 2.387184 1.109869 2.226858 1.083424 Preliminary test estimation for MR PTMR, PTMR, PTMR, PTMR, Estimate sd Estimate sd Estimate sd Estimate sd 0.0 0.0 0.01246 0.00856 0.03605 0.01578 0.03828 0.01708 0.03828 0.01708 0.1 0.02731 0.01393 0.10728 0.01466 0.11907 0.01593 0.11907 0.01593 0.2 0.05582 0.02647 0.18226 0.01764 0.21519 0.01559 0.21519 0.01559 0.4 0.09815 0.04943 0.32858 0.02632 0.37015 0.01870 0.37015 0.01870 0.8 0.19518 0.09782 0.66143 0.04317 0.75063 0.01868 0.75063 0.01868 MSE 2.55996 0.55901 0.24334 0.24334 0.2 0.0 0.01804 0.01010 0.05750 0.01853 0.06112 0.02014 0.06112 0.02014 0.1 0.03039 0.01426 0.11355 0.01621 0.12655 0.01728 0.12655 0.01728 0.2 0.06393 0.02630 0.18625 0.01822 0.21971 0.01648 0.21971 0.01648 0.4 0.10230 0.04901 0.31808 0.02757 0.35959 0.02064 0.35959 0.02064 0.8 0.20385 0.09668 0.63905 0.04445 0.72513 0.02178 0.72513 0.02178 MSE 2.50707 0.62413 0.31226 0.31226 0.5 0.0 0.02520 0.01120 0.07434 0.02222 0.07565 0.02530 0.07565 0.02530 0.1 0.02988 0.01521 0.12458 0.01873 0.14022 0.02006 0.14022 0.02006 0.2 0.05970 0.02746 0.18715 0.02045 0.22438 0.01961 0.22438 0.01961 0.4 0.09445 0.04990 0.31025 0.03054 0.34586 0.02530 0.34586 0.02530 0.8 0.17984 0.09789 0.61851 0.04538 0.70636 0.02399 0.70636 0.02399 MSE 2.61252 0.72805 0.44246 0.44246 0.7 0.0 0.03759 0.01581 0.11163 0.02780 0.11723 0.03073 0.11723 0.03073 0.1 0.03963 0.01675 0.14117 0.02351 0.16157 0.02493 0.16157 0.02493 0.2 0.06801 0.02819 0.18568 0.02332 0.22593 0.02314 0.22593 0.02314 0.4 0.09917 0.04982 0.29229 0.03410 0.32776 0.02927 0.32776 0.02927 0.8 0.20379 0.09593 0.58347 0.04987 0.65915 0.03311 0.65915 0.03311 MSE 2.55001 0.95592 0.69195 0.69195 0.9 0.0 0.05136 0.01859 0.17949 0.03486 0.21356 0.03809 0.21356 0.03809 0.1 0.06283 0.01780 0.18461 0.02241 0.21684 0.02302 0.21684 0.02302 0.2 0.04737 0.02795 0.15783 0.02346 0.20109 0.02179 0.20109 0.02179 0.4 0.07633 0.05073 0.25388 0.03453 0.29482 0.02799 0.29482 0.02799 0.8 0.14872 0.10104 0.47422 0.06376 0.52754 0.05286 0.52754 0.05286 MSE 2.83244 1.37271 1.10987 1.10987Table 1
Table 2
From Table
1
, it is apparent the shrinkage mixed ridge (shrinkage MR) estimator has smaller MSE and standard error (sd). Hence, the shrinkage MR is the best among all other competitors; i.e., the shrinkage MR performs better than the mixed, MR and shrinkage mixed (shrinkage
M
). Knowing this, the preliminary test approach is only applied to the mixed ridge estimator, giving rise to the PTMR estimator. According to the results of Table
2
, as the level of significance increases, the MSE increases. The graphs of the MSE against the different values of
MSE of estimators versus
Fig. 1

Although as level of multi-collinearity increases, so does the MSE values, the proposed PTMR estimator has smaller MSE among all. Further, the PTMR and shrinkage MR estimators perform better than the M and MR estimators in multi-collinear situations.
In a similar framework as explained in Temesgen and Kebede [
4
], this section focuses on analyzing HIV data using the linear mixed model. In particular, in this study, we analyze the performance of the proposed estimators using the aids dataset taken from “JMbayes” package in R. The dataset consists of seven covariates for each
Introduction to data and variables format Variables Description Patients identifier, in total there are 467 patients The time to death or censoring A numeric vector with 0 denoting censoring and 1 death The CD4 cells counts The time points at which the CD4 cells count was recorded A factor with levels A factor with levels A factor with levels AIDS denoting previous opportunistic infection (AIDS diagnosis) at study entry, and noAIDS denoting no previous infection A factor with levels Table 3
Variables have been measured
Summary of dataset Min 0.47 0.00 0.00 1st. Q 12.23 3.16 0.00 Median 14.07 5.47 2.00 Mean 13.89 7.02 4.21 3rd. Q 17.00 10.44 6.00 Max 21.40 24.12 18.00 Death: 412 ddI: 688 Male: 1288 AIDS: 863 Failure: 491 Censoring: 993 ddC: 717 Female: 117 noAIDS: 542 Intolerance: 914Table 4
In addition, we use shrinkage methods to increase estimation efficiency. For the purpose of utilizing the mixed model in “
Shrinkage approach
” section, the log transform was applied to the CD 4 counts.
Estimations of real data Variables Time 0.06869698 0.2073583 0.02856897 0.08031050 0.2073583 Death − 2.04918509 − 1.0093676 − 0.85219338 − 0.39093109 − 1.0093676 Obstime − 0.14989040 − 0.1434991 − 0.06233483 − 0.05557763 − 0.1434991 Drug 0.52531199 0.3335692 0.68193879 0.49110137 0.3335692 AZT 0.66381892 0.1724118 0.27606198 0.06677560 0.1724118 MPE 1.20687 0.02114 1.04644 0.00052 0.02114 Standard deviation (sd) estimations Variables Time 0.0006130127 1.666749e 0.0005790592 9.235027e 1.666749e Death 0.0157215596 1.170777e 0.0152335866 3.478280e 1.170777e Obstime 0.0011262431 4. 908799e 0.0010734227 8. 779722e 4.908799e Drug 0.0374385567 1.637019e 0.0344460129 4.589694e 1.637019e AZT 0.0044424233 4.234534e 0.0041215722 8.977002e 4. 234534eTable 5
Table 6
Table
5
shows the mixed, MR, shrinkage
M
, shrinkage MR and PTMR estimators, respectively, denoted by
From the medical point of view as well, it is shown that ddI yields better treatment in controlling the growth of the HIV virus in the human body (see Molina et al. [ 26 ]) while the drug ddC (zalcitabine) has been strongly recommended to be unused due to its countereffects as discussed in the book “clinical neurotoxicology” by Dobbs [ 27 ] and Bilgrami and O’Keefe [ 28 ].
To compare the performance of the shrinkage MR estimator, we evaluate the MPE; the lesser, the better. In what follows, we describe the scheme we used to derive the MPE. For our purpose, a
K
-fold cross-validation is used to obtain an estimate of the prediction errors of the model. In a
K
-fold cross-validation, the dataset is randomly divided into
K
subsets of roughly equal size. One subset is left aside,
Our results are based on
Box plots for the PEs of the real dataFig. 2

From the results in the estimation table, it could be deduced that the didanosine drug provides a better treatment.
In this paper, we developed a preliminary test and Stein-type ridge regression estimation in linear mixed model for longitudinal data analysis. Hence, we considered a penalized likelihood approach and proposed the shrinkage mixed ridge estimator for the vector of regression coefficients. An EM algorithm is also exhibited to solve the penalized likelihood for the unknown parameters. Simulation studies demonstrated the good performance of the proposed estimator for multicollinear situations compared to the maximum likelihood estimator. In addition, the above model has contributed largely to justify the use of didanosine in improving the health states of HIV patients, as stated in various biomedical studies. Henceforth, such model and its estimation step based on shrinkage is highly commendable for medical studies of such genre.
We would like to thank the referees for constructive comments which greatly improved the presentation of paper.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.