This article presents a parametric bootstrap approach to inference on the regression coefficients in panel data models. We aim to propose a method that is easily applicable for implement hypothesis testing and construct confidence interval of the regression coefficients vector of balanced and unbalanced panel data models. We show the results of our simulation study to compare of our parametric bootstrap approach with other approaches and approximated methods based on a Monte Carlo simulation study.

PB inferences for the regression coefficients

Balanced panel data models

Panel data regression models show the behaviour of several explanatory variables on the response variable between N individuals over T time periods. A panel data model is

Yit=α+xit′β+uit,

with

uit=μi+νit,i=1,2,…,N;t=1,2,…,T,

where Yit

and xit

are the response value and K explanatory variables on the i th individual for the t th time period, respectively. uit

is the regression disturbance, μi

denotes the unobservable individual specific effect and νit

denotes the remainder disturbance. Usually, in the random effects model, we suppose that μi∼N(0,σμ2)

and νit∼N(0,σν2)

vary independently. α

is the intercept and β

is K×1

vector of unknown coefficients. Let yit

denote the observed values of Yit

for i=1,2,…,N;t=1,2,…,T

Equation ( ^2.1 ) can also be expressed as matrix notations,

Y=α1NT+Xβ+Zμμ+ν=Zδ+u,

where Y=(Y11,…,Y1T,…,YN1,…,YNT)′

, X

is a NT×K

matrix, Z=[1NT,X]

, δ=(α,β′)′

is a unknown regression coefficients vector, Zμ=IN⊗1T

, μ=(μ1,μ2,…,μN)′

, ν=(ν11,…,ν1T,…,νN1,…,νNT)′

, u=Zμμ+ν

, IN

is an identity matrix of order N , 1T

denotes the T×1

vector whose elements are all ones and ⊗

denotes Kronecker product.

Let JT=1T1T′ , J¯T=1TJT and ET=IT-J¯T . Then, the covariance matrix of Y is

Cov(Y)=Σ=σμ2(IN⊗JT)+σν2(IN⊗IT)=σ12P+σν2Q,

where σ12=Tσμ2+σν2

, P=IN⊗J¯T

and Q=IN⊗ET

. ( ^2.3 ) is the spectral decomposition representation of Σ

, which is the main key to the following inferences. Both P

and Q

are symmetric and idempotent matrices, such that PQ=QP=0

. [ ¹⁵ ] using the properties of P

and Q

show that

Σr=σ12rP+σν2rQ,

where r is an arbitrary scalar. Hence,

Σ-1=σ12-1P+σν2-1Q.

The generalized least squares estimator (GLSE) of δ

is obtained by [ ¹² ] as

δ^σ12,σν2,Y=(Z′Σ-1Z)-1Z′Σ-1Y.

It is easy to verify

δ^σ12,σν2,Y∼N(δ,(Z′Σ-1Z)-1).

To attain the estimators of σν2

and σ12

, transformed model ( ^2.2 ) is as follows:

QYPY=QZPZδ+QuPu=QXβPZδ+QνPu.

It is easy to show that QY∼N(QXβ,σν2Q)

and PY∼N(PZδ,σ12P)

, such that PY

and QY

are mutually independent, since

CovQYPY=σν2Q00σ12P.

Therefore, we can define

S12=Y′PY-Y′PZ(Z′PZ)-1Z′PY,Sν2=Y′QY-Y′QX(X′QX)-1X′QY,

such that S12

and Sν2

are independently distributed as

S12σ12∼χ(N-K-1)2,Sν2σν2∼χ(N(T-1)-K)2,

where χ(m)2

denotes a central Chi-square random variable with m degree of freedom. Then, the unbiased estimators of σ12

, σν2

and σμ2

can be given

σ~12=S12N-K-1,σ~ν2=Sν2N(T-1)-K,andσ~μ2=1Tσ~12-σ~ν2.

According to ( ^2.4 ) and ( ^2.5 ), the natural estimators of Σ

and Σ-1

are, respectively,

Σ~=σ~12P+σ~ν2QandΣ~-1=σ~12-1P+σ~ν2-1Q.

When Σ

is known, a natural pivotal quantity for inferences on δ

is given by

H∗=(δ^-δ)′(Z′Σ-1Z)(δ^-δ)∼χ(K+1)2.

Then,

Rδ=δ|(δ^0-δ)′(Z′Σ-1Z)(δ^0-δ)<χ(γ,K+1)2

is an exact 100(1-γ)%

confidence region for δ

, where δ^0

is the observed value of δ^

by replacing Y

in ( ^2.6 ) by y

and χ(γ,m)2

stands for the lower (1-γ)

th quantile of the central Chi-square distribution with m degree of freedom.

The values of σν2 , σμ2 and then Σ are usually unknown in practice. Therefore, we propose to replace σν2 and σμ2 with their unbiased estimators, which leads to

H=(δ~-δ)′(Z′Σ~-1Z)(δ~-δ),

where δ~=(Z′Σ~-1Z)-1Z′Σ~-1Y

is a feasible GLSE.

We can construct a approximated (AP) confidence region as

RδAP={δ|(δ~0-δ)′(Z′Σ~-1Z)(δ~0-δ)<χ(γ,K+1)2},

where δ~0

is a observed value of δ~

. This approximated method is applicable while the sample size is large. Since the distribution of H is unknown and approximated method has poor performance (based on simulation results), we use a parametric bootstrap approach to approximate distribution of H .

Let sν2 and s12 be the observed values of Sν2 and S12 in ( ^2.9 ), respectively. For a given (δ~0,s12,sν2) , let YB∼N(Zδ~0,Σ~0) , where Σ~0 is the observed value of Σ~ . Then, the PB pivot variable based on the random quantity ( ^2.15 ) is

HB=(δ~B-δ~0)′(Z′Σ~B-1Z)(δ~B-δ~0),

where

δ~B=(Z′Σ~B-1Z)-1Z′Σ~B-1YB,Σ~B=σ~1B2P+σ~νB2Q,

σ~1B2=S1B2N-K-1,σ~νB2=SνB2N(T-1)-K,

S1B2=YB′PYB-YB′PZ(Z′PZ)-1Z′PYB

and

SνB2=YB′QYB-YB′QX(X′QX)-1X′QYB.

Distribution of HB

for a given (δ~0,s12,sν2)

in ( ^2.16 ) does not depend on any unknown parameters. Therefore, we can construct a PB confidence region for the parameter δ

based on the distribution of HB,

where HγB

denotes the lower (1-γ)

th quantile of HB

. Then, we propose a 100(1-γ)%

confidence region for δ

RδB=δ|(δ~0-δ)′(Z′Σ~0-1Z)(δ~0-δ)<HγB.

Next, we consider the problem of hypothesis testing about δ

H0:δ=δ∗vs.H1:δ≠δ∗,

where δ∗=(α∗,β1∗,…,βK∗)′

is a pre-specified values vector. Our proposed test statistic is

D=(δ~-δ∗)′(Z′Σ~-1Z)(δ~-δ∗).

The null hypothesis ( ^2.18 ) is rejected at level γ

when D0>HγB

H_\gamma ^B$$\end{document}]]>

, where D0

is the observed value of D . Also, it can be defined a PB p value as

p=P(HB>D0).

D_0). \end{aligned}$$\end{document}]]>

Therefore, H0

is rejected at level γ

when p<γ

Unbalanced panel data Models

The unbalanced panel data model is given by:

Yit=α+xit′β+uit,

with

uit=μi+νit,i=1,2,…,N;t=1,2,…,Ti,

where Yit

, xit

and so on are similar to the balanced case which is defined, with the difference that in unbalanced case, the time period for each i th cross section is different and equal to the time Ti

. In matrix notations, equation ( ^2.21 ) can also be expressed as

Y=α1n+Xβ+Zμμ+ν=Zδ+u,

where n=Σi=1NTi,Y=(Y11,…,Y1T1,…,YN1,…,YNTN)′

, X

is a n×K

matrix, Z=[1n,X]

, δ=(α,β′)′

, Zμ=diag(1T1,…,1TN)

, μ=(μ1,μ2,…,μN)′

, ν=(ν11,…,ν1T1,…,νN1,…,νNTN)′

and u=Zμμ+ν

JTi=1Ti1Ti′ , J¯Ti=1TiJTi and ETi=ITi-J¯Ti , for i=1,…,N . Then, the covariance matrix of Y is

Cov(Y)=Σ=σμ2diag(JT1,…,JTN)+σν2Indiag[(T1σμ2+σν2)J¯T1,…,(TNσμ2+σν2)J¯TN]+σν2Q,

where Q=diag(ET1,…,ETN)

. It is established that

Σ-1=diag(T1σμ2+σν2)-1J¯T1,…,(TNσμ2+σν2)-1J¯TN+(σν2)-1Q.

Then, the generalized least square estimator (GLSE) of δ

δ^(σ12,σν2,Y)=(Z′Σ-1Z)-1Z′Σ-1Y.

Also, the GLSE of δ

is distributed as

δ^(σ12,σν2,Y)∼N(δ,(Z′Σ-1Z)-1).

Similar to the balanced case, we consider the following two quadratic forms defining the Between and Within residuals sums of squares to obtain the estimators of σμ2

and σν2

S12=Y′PY-Y′PZ(Z′PZ)-1Z′PY,S22=Y′QY-Y′QX(X′QX)-1X′QY,

where P=diag(J¯T1,...,J¯TN)

and S22/σν2∼χ(n-N-K)2

. According to [ ¹² ], the unbiased estimators of σν2

and σμ2

can be given as

σ~ν2=S22n-N-K,σ~μ2=S12-(N-K-1)σ~ν2n-tr((Z′PZ)-1Z′ZμZμ′Z).

Therefore, the natural estimators of Σ

and Σ-1

are

Σ~=diag[(T1σ~μ2+σ~ν2)J¯T1,…,(TNσ~μ2+σ~ν2)J¯TN]+σ~ν2Q,Σ~-1=diag[(T1σ~μ2+σ~ν2)-1J¯T1,…,(TNσ~μ2+σ~ν2)-1J¯TN]+(σ~ν2)-1Q.

To construct a confidence region for δ

in this case, we propose to use a similar random quantity H in ( ^2.15 ) and PB approach to approximated its distribution.

Simulation study

In this section, we present the results of our simulation study to compare the size and powers of our PB approach with generalized p values by [ ²³ ] and approximated methods based on a Monte Carlo simulation study. we use the abbreviation PB, GPV and AP to refer these three methods. At first, we briefly review the GPV method.

[ ²³ ] only proposed a generalized p value method for testing H0:δ=δ∗ v.s H1:δ≠δ∗ in balanced panel data state. He proposed the generalized F test for testing the null hypothesis as

T~T(Y;y,σ12,σν2,δ)=δ^(σ12,σν2,Y)-δ∗)′Sδ-2(σ12,σν2)(δ^(σ12,σν2,Y)-δ∗δ^(σ12ss1SS1,σν2ssνSSν,y)-δ∗)′Sδ-2(σ12ss1SS1,σν2ssνSSν)(δ^(σ12ss1SS1,σν2ssνSSν,y)-δ∗.

Subsequently, the generalized p value can be computed as

p=PTT≥1∣H0)=P(χ2(δ^(ss1U,ssνV,y)-δ∗)′Sδ-2(ss1U,ssνV)(δ^(ss1U,ssνV,y)-δ∗)≥1,

where Sδ2(σ12,σν2)=(Z′Σ-1Z)-1,U∼χ(N-K-1)2,V∼χ(N(T-1)-K)2,χ2∼χ(K+1)2

and χ2,U,V

are mutually independent.

Algorithm : We use the following steps to estimate powers of the PB and GPV methods.

1. For a given ( N , T ) and (Z,δ,σμ2,σν2) , generate y and compute s12,sν2,Σ~0 , δ~0 and observed value of H from ( ^2.15 ), i.e. h0 , respectively.

2. Generate YB∼N(Zδ~0,Σ~0) , U∼χ(N-K-1)2,V∼χ(N(T-1)-K)2,χ2∼χ(K+1)2 .

3. Repeat step 2 many times ( n=5000 ) to obtain values of H1B ,..., HnB and TT1 ,... TTn and compute the estimations of the p values of PB and GPV methods.

4. Repeat steps ( 1 ) to ( 3 ) for m=5000 times to obtain estimations of the two test powers.

For power estimation of the AP method, we compute the fraction of times that the value of D0 is exceed χ(γ,K+1)2.

The results of simulation for the different values of N,T,σν2,σμ2 are shown in Table ¹ . Also, we take δ∗ to be equal to (2, 3, 1, 5) and δ be various values of vectors. Notice that, in this simulation, we have used the three columns of the panel data as reported in Table ² instead of the matrix X . That is, (lnY/N,lnPMG/PGDP,lnCar/N) , where we clarified this example in section 5. The first column of Table ¹ shows estimated type I error rate (actually size) of the tests and other three columns show estimated powers. We consider the following reasonable criterion for comparing the methods: firstly, a method is preferred to the other methods when its estimated size is not significantly different than 0.05. We refer to such a method as a reliable method. Secondly, the candidate for the best method must have the largest power among reliable methods, see [ ⁷ , ⁹ , ¹⁰ , ²⁰ ] and [ ⁶ ]. In addition, using the central limit theorem, 98% confidence intervals around estimates between 0.0428 and 0.0572 cover the nominal level 0.05. In other words, if the estimated size of a test is less than or greater than that of these bounds, we can conclude that that test is conservative or liberal, respectively. In Table ¹ , the estimated sizes in boldface show that they are significantly less or greater than 0.05.

Table 1

Simulated powers of the GPV, PB and AP tests at 5% nominal level

(N, T)	(σμ2,σν2)	δ
(N, T)	(σμ2,σν2)	Tests	(2,3,1,5)	(2.1,3.1,1.1,5.1)	(4,3,1,5)	(2,3.1,1,5.1)	(2,3,1.5,5.1)
(10, 6)	(0.01, 1)	GPV	0.0628	1.0000	1.0000	1.0000	1.0000
		PB	0.0458	1.0000	1.0000	1.0000	1.0000
		AP	0.1448	1.0000	1.0000	1.0000	1.0000
	(1, 1)	GPV	0.0718	0.8582	0.9570	0.8784	0.8194
		PB	0.0446	0.8700	0.9754	0.8956	0.7896
		AP	0.1308	0.9530	0.9942	0.9654	0.9114
	(10, 1)	GPV	0.0638	0.2612	0.2622	0.2578	0.3680
		PB	0.0508	0.2388	0.2870	0.2416	0.3006
		AP	0.0972	0.3654	0.4068	0.3550	0.4456
	(100, 1)	GPV	0.0576	0.1322	0.0814	0.1336	0.2556
		PB	0.0466	0.1074	0.0756	0.1078	0.2483
		AP	0.0912	0.1788	0.1280	0.1782	0.3312
(12, 5)	(0.01, 1)	GPV	0.0518	1.0000	1.0000	1.0000	1.0000
		PB	0.0486	1.0000	1.0000	1.0000	1.0000
		AP	0.1408	1.0000	1.0000	1.0000	1.0000
	(1, 1)	GPV	0.0494	0.8594	0.9796	0.8850	0.6903
		PB	0.0496	0.8484	0.9760	0.8730	0.6812
		AP	0.1300	0.9620	0.9964	0.9732	0.8734
	(10, 1)	GPV	0.0628	0.1826	0.2724	0.1888	0.1386
		PB	0.0508	0.1632	0.2362	0.1596	0.1200
		AP	0.1352	0.3320	0.4498	0.3354	0.2662
	(100, 1)	GPV	0.0772	0.0880	0.0654	0.0630	0.0710
		PB	0.0547	0.0644	0.0484	0.0446	0.0528
		AP	0.1318	0.1466	0.1106	0.1138	0.1286
(20, 3)	(0.01, 1)	GPV	0.0588	1.0000	1.0000	1.0000	1.0000
		PB	0.0492	1.0000	1.0000	1.0000	1.0000
		AP	0.0964	1.0000	1.0000	1.0000	1.0000
	(1, 1)	GPV	0.0616	0.9948	1.0000	0.9968	0.9726
		PB	0.0528	0.9960	1.0000	0.9968	0.9698
		AP	0.0952	0.9988	1.0000	0.9988	0.9840
	(10, 1)	GPV	0.0606	0.3576	0.5180	0.3628	0.3818
		PB	0.0516	0.3456	0.5166	0.3558	0.3548
		AP	0.0828	0.4302	0.6010	0.4446	0.4446
	(100, 1)	GPV	0.0510	0.0968	0.0950	0.0906	0.2002
		PB	0.0456	0.0912	0.0900	0.0836	0.1906
		AP	0.0704	0.1284	0.1266	0.1190	0.2452

Table 2

Data of motor gasoline consumption

Country	Year	lnGas/Car	lnY/N	lnPMG/PGDP	lnCar/N
Austria	1960	4.1732	-6.4743	-0.3345	-9.7668
	1961	4.1010	-6.4260	-0.3513	-9.6086
	1962	4.0732	-6.4073	-0.3795	-9.4573
	1963	4.0595	-6.3707	-0.4143	-9.3432
	1964	4.0377	-6.3222	-0.4453	-9.2377
Belgium	1960	4.1640	-6.2151	-0.1657	-9.4055
	1961	4.1244	-6.1768	-0.1717	-9.3031
	1962	4.0760	-6.1296	-0.2223	-9.2181
	1963	4.0013	-6.0940	-0.2505	-9.1149
	1964	3.9944	-6.0365	-0.2759	-9.0055
Canada	1960	4.8552	-5.8897	-0.9721	-8.3789
	1961	4.8266	-5.8843	-0.9723	-8.3467
	1962	4.8505	-5.8446	-0.9786	-8.3205
	1963	4.8381	-5.7924	-1.0190	-8.2694
	1964	4.8398	-5.7601	-1.0029	-8.2524
Denmark	1960	4.5020	-6.0617	-0.1957	-9.3262
	1961	4.4828	-6.0009	-0.2536	-9.1931
	1962	4.3854	-5.9875	-0.2188	-9.0473
	1963	4.3540	-5.9731	-0.2480	-8.9528
	1964	4.3264	-5.8947	-0.3065	-8.8526
France	1960	3.9077	-6.2644	-0.0196	-9.1457
	1961	3.8856	-6.2209	-0.0239	-9.0443
	1962	3.8237	-6.1736	-0.0689	-8.9301
	1963	3.7890	-6.1371	-0.1379	-8.8186
	1964	3.7671	-6.0872	-0.1978	-8.7110
Germany	1960	3.9170	-6.1598	-0.1859	-9.3425
	1961	3.8853	-6.1209	-0.2310	-9.1838
	1962	3.8715	-6.0943	-0.3438	-9.0373
	1963	3.8488	-6.0684	-0.3746	-8.9136
	1964	3.8690	-6.0134	-0.3997	-8.8110
Spain	1960	4.7494	-6.1661	1.1253	-11.5884
	1961	4.5892	-6.0578	1.1096	-11.3840
	1962	4.4291	-5.9805	1.0570	-11.1578
	1963	4.3465	-5.9051	0.9768	-10.9845
	1964	4.3006	-5.8585	0.9153	-10.7879
Sweden	1960	4.0630	-8.0725	-2.5204	-8.7427
	1961	4.0619	-8.0196	-2.5715	-8.6599
	1962	4.0064	-7.9972	-2.5345	-8.5774
	1963	4.0028	-7.9667	-2.6051	-8.4943
	1964	4.0249	-7.8976	-2.6580	-8.4335
Switzer	1960	4.3976	-6.1561	-0.8232	-9.2624
	1961	4.4413	-6.1116	-0.8656	-9.1582
	1962	4.2871	-6.0930	-0.8222	-9.0461
	1963	4.3125	-6.0680	-0.8601	-8.9508
	1964	4.3134	-6.0215	-0.8677	-8.8394
Turkey	1960	6.1296	-7.8011	-0.2534	-13.4752
	1961	6.1062	-7.7867	-0.3425	-13.3847
	1962	6.0846	-7.8363	-0.4082	-13.2459
	1963	6.0751	-7.6312	-0.2250	-13.2550
	1964	6.0646	-7.6269	-0.2522	-13.2103
U.K.	1960	4.1002	-6.1868	-0.3911	-9.1176
	1961	4.0886	-6.1689	-0.4519	-9.0489
	1962	4.0481	-6.1667	-0.4229	-8.9669
	1963	3.9853	-6.1307	-0.4634	-8.8559
	1964	3.9768	-6.0864	-0.4958	-8.7498
U.S.A.	1960	4.8240	-5.6984	-1.1211	-8.0195
	1961	4.7963	-5.6952	-1.1462	-7.9993
	1962	4.7989	-5.6488	-1.1619	-7.9864
	1963	4.7879	-5.6269	-1.1799	-7.9595
	1964	4.8083	-5.5871	-1.2003	-7.9299

Note that the estimated powers vary slightly from one simulation to another [ ⁹ ]. Therefore, we used the well-known z test to compare powers of two methods. One can conclude that the powers of two test procedures are statistically significant at 100 α % level when |p^1-p^2|>Zα/2p^(1-p^)/5000Z_{\alpha /2}\sqrt{{\hat{p}}(1-{\hat{p}})/5000}$$\end{document}]]> , where p^=(p^1+p^2)/2 and p^1 and p^2 denote the estimated powers of the two test procedures based on 5000 samples. In the following remarks, we discuss the results of simulation.

Remark 1

In all cases that we considered here, the estimated sizes of our PB test vary between 0.0446 and 0.0547 which shows that our proposed test behaves like the exact test.

Remark 2

The simulated size probabilities in the GPV and AP often exceed the upper limit of this range, and then, these methods are assumed to be liberal. Therefore, in this paper, the powers of these test methods cannot be comparable with our parametric bootstrap approach.

Remark 3

To compare the estimated power, in the cases that the estimated size of GPV is close to 0.05, the PB test and GPV have not significantly different powers.

Remark 4

Overall, it seems that the proposed PB method has better performance than two other methods in terms of both controlling the type I error rates and powers.

Acknowledgements

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Inferences on the regression coefficients in panel data models: parametric bootstrap approach

Abstract

Introduction