第9章 异方差问题检验与修正.ppt
第9章 异方差:检验与修正,Heteroskedasticity:test and correction,Contents,Whats heteroskedasticity?Why worry about heteroskedasticity?How to test the heteroskedasticity?Corrections for heteroskedasticity?,Whats heteroskedasticity?,What is Heteroskedasticity?,Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,the variance of the unobserved error,u,was constantvar(u|X)=s2(homoskedasticity)If this is not true,that is if the variance of u is different for different values of the Xs,then the errors are heteroskedasticvar(ui|Xi)=si2(heteroskedasticity),Example of homoskedasticity,Example of Heteroskedasticity,Examples,Generally,cross-section data more easily induce heteroskedasticity because of different characteristics of different individuals.Consider a cross-section study of family income and expenditures.It seems plausible to expect that low income individuals would spend at a rather steady rate,while the spending patterns of high income families would be relatively volatile.If we examine sales of a cross section of firms in one industry,error terms associated with very large firms might have larger variances than those error terms associated with smaller firms;sales of larger firms might be more volatile than sales of smaller firms.,Patterns of heteroskedasticity,The relation between R&D expenditure and Sales,The scatter graph between R&D expenditure and Sales,Why Worry About Heteroskedasticity?,The consequences of heteroskedasticity,OLS estimates are still unbiased and consistent,even if we do not assume homoskedasticity.take the simple regression as an example Y=b0+b1 X+uWe know the OLS estimator of b1 is,The consequences of heteroskedasticity,cont.,The R2 and adj-R2 are unaffected by heteroskedasticity.Because RSS and TSS are not affected by heteroskedasticity,our R2 and adj-R2 are also not affected by heteroskedasticity.,The consequences of heteroskedasticity,cont.,The standard errors of the estimates are biased if we have heteroskedasticity,The consequences of heteroskedasticity,cont.,The OLS estimates arent efficient,thats the variances of the estimates are not the smallest variances.If the standard errors are biased,we can not use the usual t statistics or F statistics for drawing inferences.That is,the t test and F test and the confidence interval based on these test dont work.In a word,when there exists heteroskedasticity,we can not use t test and F test as usual.Or else,well get the misleading result.,Summary of the consequences of heteroskedasticity,OLS estimates are still unbiased and consistentThe R2 and adj-R2 are unaffected by heteroskedasticityThe standard errors of the estimates are biased.The OLS estimates arent efficient.Then,the t test and F test and the confidence interval dont work.,How to test the heteroskedasticity?,Residual plot,In the OLS estimation,we often use the residual ei to estimate the random error term ui,therefore,we can test whether there is heteroskedasticity of ui by examine ei.We plot the scatter graph between ei2 and X.,Residual plot,cont.,Residual plot,cont.,If there are more than one independent variables,we should plot the residual squared with all the independent variables,separately.There is a shortcut to do the residual plot test when there are more than 1 independent variables.That is,we plot the residual with the fitted value,because is just the linear combination of all Xs.,Residual plot:example 9.2,Park test,If there exists heteroskedasticity,then the variance of error term ui,si2 may be correlated with some of the independent variables.Therefore,we can test whether si2 is correlated with any of the explanatory variables.If they are related,then there exists heteroskedasticity,on the contrary,theres no heteroskedasticity.For example,for the simple regression model ln(si2)=b0+b1 ln(Xi)+vi,Procedure of Park test,Regress dependent variable(Y)on independent variables(Xs),first.Get the residual of the first regression,ei and ei2.Then,take ln(ei2)as dependent variable,the original independent variables logged as explanatory variables,make a new regression.ln(ei2)=b0+b1 ln(Xi)+viThen test H0:b1=0 against H1:b1 0.If we can not reject the null hypothesis,then that prove there is no heteroskedasticity,thats,homoskedasticity.,Park test:Example,Let take example 9.2 as exampleFirst,regress R&D expenditure(rdexp)on sales(sales),we getrdexp=192.91+0.0319 salesSe=(991.01)(0.0083)N=18 R2=0.4783 Adj-R2=0.4457 F(1,16)=14.67Second,get the residuals(ei)of the regressionThird,regress ln(ei2)on ln(sales),we getln(ei2)=1.216 ln(sales)Se=(0.057)p=(0.000)R2=0.9637 Adj-R2=0.9615Finally,we test whether the slope of the second regression equal zero.From the p-value of the parameter,given 5%significant level,we will can reject the null hypothesis.Therefore,there exist heteroskedasticity in the first regression.Note:Park test is not a good test for heteroskedeasticity because of his special specification of the auxiliary regression,which may be heteroskedastic.,Glejser test,The essence of Glejser test is same to Park test.But,Glejser suggest we can use the following regression to detect the heteroskedasticity of u.|ei|=b0+b1 Xi+vi|ei|=b0+b1 Xi+vi|ei|=b0+b1(1/Xi)+viStill,we just test H0:b1=0 against H1:b1 0.If we can reject the null hypothesis,then that prove there is heteroskedasticity.On the contrary,its homoskedasticity.,Glejser test:example 9.2,First,regress R&D expenditure(rdexp)on sales(sales),we getrdexp=192.91+0.0319 salesSe=(991.01)(0.0083)N=18 R2=0.4783 Adj-R2=0.4457 F(1,16)=14.67Second,get the residuals(ei)of the regressionThird,regress|ei|on 1/sales,we get|ei|=2273.65-1992500(1/sales)se=(604.69)(12300000)p=(0.002)(0.125)Finally,test whether the slope is zero.From the p-value of the slope,we can see it larger than 5%of significance level.We can not reject the null hypothesis,that means there doesnt exist heteroskedasticity.,The White Test,The White test is more general test,which allows for nonlinearities by using squares and crossproducts of all the Xs,ie.,k=3Y=b0+b1X1+b2X2+b3X3+ue2=d0+d1 X1+d2X2+d3 X3+d4 X12+d5X22+d6X32+d7X1X2+d8X1X3+d9X2X3+vUsing an F or LM to test whether all the Xj,Xj2,and XjXh are jointly significant,that is,to test H0:d1=d2=d9=0 against H1:H0 is not true.If we can reject H0,that means there exists heteroskedasticity.,The White Test,To test H0:d1=d2=d9=0,we can use F test learned in chapter 4.Let R2 stands for the goodness of fit from the auxiliary regression.F=R2/k/(1 R2)/(n k 1)We also can use LM test.LM=nR2c2(k),n is number of obs.k is the number of restrictions.,The White Test:Example 9.2,First,regress R&D expenditure(rdexp)on sales(sales)and profits(profits),we getrdexp=-13.93+0.0126 sales+0.2398profitsse=(991.997)(0.018)(0.1986)p=(0.989)(0.496)(0.246)n=18 R2=0.5245 Adj-R2=0.4611 F=8.27Second,we get the residuals e from the regression above.Third,regress e2 on sales,profits,sales2,profits2,and salesprofits.e2=693735.5+135.00sales-1965.7profits-0.0027sales2-0.116 profits2+0.050salesprofitsN=18 R2=0.8900 F(5,12)=19.42 Prob F=0.0000Finally,test H0:d1=d2=d3=d4=d5=0,The p-value of the F test is 0.0000,so we can reject H0.LM=nR2=180.89=16.02 c20.05(5)=11.07,also reject H0.So,there exists heteroskedasticity in the first regression.,Alternate form of the White test,This can get to be unwieldy pretty quicklyConsider that the fitted values from OLS,are a function of all the XsThus,2 will be a function of the squares and crossproducts and and 2 can proxy for all of the Xj,Xj2,and XjXh,so Regress the residuals squared on and 2 and use the R2 to form an F or LM statisticNote only testing for 2 restrictions now,The procedure of the special case of white test,regress Y on X1,X2,Xk.We get the residual eiCalculate,2(predict ybar,xb.Gen ybarsq=ybar2)regress e2 on,2.And test the joint zero hypotheses of the regressorsUse F statistic or LM test to test the null hypothesis of homoskedasiticity.,Example:white test in wage determination equation,First,using OLS estimate the model without considering heteroskedasticitywge=-2.87+0.599educ+0.022exper+0.139tenureCalculate the residuals of regression,ei and the fitted value of wage,wge.Therefore,the value of ei2,wge2.Regress ei2 on wge,wge2,we getei2=7.36 2.86 wge+0.49 wge2se=(5.62)(1.76)(0.125)n=526 R2=0.0984 F=28.55 ProbF=0.000Test Ho:d1=d2=0,F test,F=28.55 ProbF=0.000 5.99=c20.05(2),reject H0.,Corrections for Heteroskedasticity,Corrections for Heteroskedasticity,Known variances,Var(ui|X)=si2The original model isYi=b0+b1Xi1+bkXik+uiTwo sides divided by si at the same timeThe new disturbance isui*=ui/si,then var(ui*)=var(ui/si)=var(ui)/si2=1So the new modelYi/si=b0/si+b1Xi1/si+bkXik/si+ui/si,that is,Y*=b0*+b1X1*+bkXk*+u*We can estimate the new model with OLS,this is called WLSBut,usually,we dont know the variances.,Case of form being known up to a multiplicative constant,Suppose the heteroskedasticity can be modeled as Var(u|X)=s2h(X),where the trick is to figure out what h(X)hi looks likeE(ui/hi|X)=0,because hi is only a function of X,and Var(ui/hi|X)=s2,because we know Var(u|X)=s2hiSo,if we divided our whole equation by hi we would have a model where the error is homoskedastic,Case 1:h(X)=X,The simple regression modelYi=b0+b1Xi+uiWe know ui is heteroskedasticity and the variance of ui is Var(u|Xi)=s2h(Xi)=s2Xi,Then,we divide the original model by Xi two sides,get a know modelYi/Xi=b0/Xi+b1 Xi/Xi+ui/Xi,rewrite it asYi/Xi=b0/Xi+b1Xi+vi(*)Var(vi)=var(ui/Xi)=var(ui)/Xi=s2,which is homoskedastic.Therefore,the new equaiton(*)can be estimated using OLS.,Example 9.6(textbook2e,p233),We have proved that there exist heteroskedasticity in the model of R&D expenditure determination model.Now,we assume the variance of the error term change with independent variable sales,that is,var(ui)=s2salesiThe original model isrdexpi=b0+b1salesi+uiThe transformed model isrdexpi/salesi=b0(1/salesi)+b1 salesi+vi,Where,vi=ui/salesi,Example 9.6(textbook2e,p233),Estimate of the transformed model isrdexp/sales=-246.73(1/sales)+0.0368 salesrdexp=-246.73+0.0368salesse=(381.16)(0.0071)t=(-0.65)(5.17)n=18 R2=0.6923 Adj-R2=0.6538 F=18.00WLS command:reg rdexp sales aweight=1/salesEstimate of the original model isrdexp=192.91+0.0319 salesSe=(991.01)(0.0083)t=(0.19)(3.83)N=18 R2=0.4783 Adj-R2=0.4457 F(1,16)=14.67Compare the result of the two estimation,what do you find?,Case 2:h(X)=X2,The simple regression modelYi=b0+b1Xi+uiWe know ui is heteroskedasticity and the variance of ui is Var(u|Xi)=s2h(Xi)=s2Xi2,Then,we divide the original model by Xi two sides,get a know modelYi/Xi=b0/Xi+b1 Xi/Xi+ui/Xi,rewrite it asYi/Xi=b0/Xi+b1+vi(*)Var(vi)=var(ui/Xi)=var(ui)/Xi2=s2,which is homoskedastic.Therefore,the new equaiton(*)can be estimated using OLS.,Generalized Least Squares,Estimating the transformed equation by OLS is an example of generalized least squares(GLS)GLS will be BLUE in this case,(because the transformed equation will meet the Gauss-Markov assumption)GLS is a weighted least squares(WLS)procedure where each squared residual is weighted by the inverse of Var(ui|xi),More on WLS,More on WLS,cont.,More on WLS,cont.,A similar weighting arises when we are using per capita data at the city,country,state,or country level.If the individual-level equation satisfies the Guass-Markov assumptions,then the error in per captia equation has a variance proportional to one over the size of the population.Therefore,weighted least squares with weights equal to the population is appropriate.,Summary of WLS,WLS is great if we know what Var(ui|xi)looks likeIn most cases,wont know form of heteroskedasticityExample where do is if data is aggregated,but model is individual levelWant to weight each aggregate observation by the inverse of the number of individuals,Feasible GLS,More typical is the case where you dont know the form of the heteroskedasticity.In this case,you need to estimate h(xi)Typically,we start with the assumption of a fairly flexible model,such asVar(u|x)=s2exp(d0+d1x1+dkxk)Since we dont know the d,must estimate,Feasible GLS(continued),Our assumption implies that u2=s2exp(d0+d1x1+dkxk)vWhere E(v|x)=1,then if E(v)=1ln(u2)=a0+d1x1+dkxk+eWhere E(e)=0 and e is independent of xNow,we know that e is an estimate of u,so we can estimate this by OLS,Feasible GLS(continued),Now,an estimate of h is obtained as=exp(),and the inverse of this is our weight,So,what did we do?Run the original OLS model,save the residuals,e,square them and take the log,that is ln(e2)Regress ln(e2)on all of the independent variables and get the fitted values,Do WLS using 1/exp()as the weight,Example of FGLS:Demand for Cigarettes(Smoke.raw),What determine the peoples daily demand consumption?Variablescigs,cigarettes smoked per day.income,annual income,$.cigpric,cigarettes price for per pack.age,in yearsrestaurn,dummy vaiable=1 if state restaurant smoking restrictionsModelcgs=-3.64+0.88 log(income)0.75 log(cigpric)0.50 educ+0.77 age 0.009 age2 2.83 restaurn,Example of FGLS:Demand for Cigarettes,Use White test the heteroskedasticity:Get e2 and reg e2 on all independent variablesGet F=13.69 p-value=0 Or,LM=8070.0329=26.55 p-value=0That proves there exists heteroskedasticity.reg ln(e2)on all the independent variables and get the fitted value Transforming all the data with 1/e,and regress the transformed equation without constant.cgs=5.63+1.295 log(income)2.94 log(cigpric)0.463 educ+0.482 age 0.0056 age2 3.461 restaurnThe income effect is now statistically significant and larger in magnitude.The estimates changed somewhat,but the basic story is still the same.Cigarette smoking is negatively related to schooling,has a quadratic relationship with age,and is negatively affected by restaurant smoking restrictions.,New specifications to correct heteroskedasticity,Use new specifications,sometimes,can correct the heteroskedasticity.The log-linear model is more often homoskedasticity.For example,in the example 9.2,we can use the following specification:ln(rdexpi)=b0+b1ln(salesi)+uiThe estimated model isln(rdexpi)=-7.37+1.32 ln(salesi)se=(1.85)(0.17)t=(-3.99)(7.87)n=18 R2=0.7946 Adj-R2=0.7818 F=61.91,New specifications to correct heteroskedasticity,Using the White test to test heteroskedasticity2=-5.84+0.96 ln(salesi)-0.034 ln(sales)2se=(15.16)(2.84)(0.13)t=(-0.39)(0.34)(-0.26)n=18 R2=0.1454 Adj-R2=0.0315 F=1.28 ProbF=0.3078Test:Ho:d1=d2=0 F test:F=1.28 F0.05=3.68LM test:LM=nR2=180.1454=2.62 c20.05(2)=5.99So,we will cant reject H0,that is,there is no heteroskedasticity.,White Heteroskedasticity-robust standard error,We have just learned that although the OLS estimate is still unbiased when there is heteroskedasticity,but OLS estimate is not efficiency.That is,the standard error from usual OLS is biased.Therefore the corresponding t test and F test dont work any more.White designed a new method which considering the heteroskedasticity,which can calculate the correct standard error and the corresponding t test works.In this case,we can use the t test and F test,but it requires large sample,be