《计量经济学》ch-03-wooldridg.ppt
Chapter 3,Multiple RegressionAnalysis:Estimation,Wooldridge:Introductory Econometrics:A Modern Approach,5eInstructed by professor Yuan,Huiping,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.2 Mechanics and Interpretation of OLS,3.3 The Expected Value of the OLS Estimators,3.4 The Variance of the OLS Estimators,3.5 Efficiency of OLS:The Gauss-Markov Theorem,3.1 Motivation for Multiple Regression,3.6 Some Comments on the Language of Multiple Regression Analysis,Assignments:Promblems 7,9,10,11,13,Computer Exercises C1,C3,C5,C6,C8,The End,Definition of the multiple linear regression model,Dependent variable,explained variable,response variable,Independent variables,explanatory variables,regressors,Error term,disturbance,unobservables,Intercept,Slope parameters,Explains variable in terms of variables“,3.1 Motivation for Multiple Regression(1/5),CHAPTER 3 Multiple RegressionAnalysis:Estimation,Chapter,End,Motivation for multiple regressionIncorporate more explanatory factors into the modelExplicitly hold fixed other factors that otherwise would be in Allow for more flexible functional formsExample:Wage equation,Hourly wage,Years of education,Labor market experience,All other factors,Now measures effect of education explicitly holding experience fixed,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.1 Motivation for Multiple Regression(2/5),Chapter,End,Example:Average test scores and per student spendingPer student spending is likely to be correlated with average family income at a given high school because of school financingOmitting average family income in regression would lead to biased estimate of the effect of spending on average test scoresIn a simple regression model,effect of per student spending would partly include the effect of family income on test scores,Average standardizedtest score of school,Other factors,Per student spendingat this school,Average family incomeof students at this school,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.1 Motivation for Multiple Regression(3/5),Chapter,End,Example:Family income and family consumptionModel has two explanatory variables:inome and income squaredConsumption is explained as a quadratic function of incomeOne has to be very careful when interpreting the coefficients:,Family consumption,Other factors,Family income,Family income squared,By how much does consumptionincrease if income is increasedby one unit?,Depends on how much income is already there,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.1 Motivation for Multiple Regression(4/5),Chapter,End,Example:CEO salary,sales and CEO tenureModel assumes a constant elasticity relationship between CEO salary and the sales of his or her firmModel assumes a quadratic relationship between CEO salary and his or her tenure with the firmMeaning of linear“regressionThe model has to be linear in the parameters(not in the variables),Log of CEO salary,Log sales,Quadratic function of CEO tenure with firm,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.1 Motivation for Multiple Regression(5/5),Chapter,End,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.2 Mechanics and Interpretation of OLS,3.2.2 Interpreting the OLS Regression Equation,3.2.3 OLS Fitted Values and Residuals,3.2.1 Obtaining the OLS Estimates,3.2.4 A“Partialling Out”Interpretation of Multiple Regression,3.2.5 Comparison of Simple and Multiple Regression Estimates,3.2.6 Goodness of Fit,3.2.7 Regression through the Origin,Chapter,End,OLS Estimation of the multiple regression modelRandom sampleRegression residualsMinimize sum of squared residuals,Minimization will be carried out by computer,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.2.1 Obtaining the OLS Estimates(1/2),Section,Chapter,End,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.2.1 Obtaining the OLS Estimates(2/2),Section,Chapter,End,Interpretation of the multiple regression modelThe multiple linear regression model manages to hold the values of other explanatory variables fixed even if,in reality,they are correlated with the explanatory variable under considerationCeteris paribus“-interpretationIt has still to be assumed that unobserved factors do not change if the explanatory variables are changed,By how much does the dependent variable change if the j-thindependent variable is increased by one unit,holding all other independent variables and the error term constant,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.2.2 Interpreting the OLS Regression Equation(1/3),Section,Chapter,End,Example 3.1:Determinants of college GPAInterpretationHolding ACT fixed,another point on high school grade point average is associated with another.453 points college grade point averageOr:If we compare two students with the same ACT,but the hsGPA of student A is one point higher,we predict student A to have a colGPA that is.453 higher than that of student BHolding high school grade point average fixed,another 10 points on ACT are associated with less than one point on college GPA,Grade point average at college,High school grade point average,Achievement test score,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.2.2 Interpreting the OLS Regression Equation(2/3),Section,Chapter,End,Example 3.2:Hourly Wage Equationwage1.wf1ls log(wage)c educ exper tenure,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.2.2 Interpreting the OLS Regression Equation(3/3),Section,Chapter,End,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.2.3 OLS Fitted Values and Residuals,Properties of OLS on any sample of dataFitted values and residualsAlgebraic properties of OLS regression,Fitted or predicted values,Residuals,Deviations from regression line sum up to zero,Correlations between deviations and regressors are zero,Sample averages of y and of the regressors lie on regression line,Section,Chapter,End,One can show that the estimated coefficient of an explanatory variable in a multiple regression can be obtained in two steps:1)Regress the explanatory variable on all other explanatory variables2)Regress on the residuals from this regressionwage1.wf1ls log(wage)c educ exper tenurels educ c exper tenureseries r1=residls log(wage)c r1,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.2.4 A“Partialling Out”Interpretation of Multiple Regression(1/3),Section,Chapter,End,Why does this procedure work?The residuals from the first regression is the part of the explanatory variable that is uncorrelated with the other explanatory variablesls educ c exper tenureseries r1=residThe slope coefficient of the second regression therefore represents the isolated effect of the explanatory variable on the dep.Variablels log(wage)c r1,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.2.4 A“Partialling Out”Interpretation of Multiple Regression(2/3),Section,Chapter,End,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.2.4 A“Partialling Out”Interpretation of Multiple Regression(3/3),Section,Chapter,End,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.2.5 Comparison of Simple and Multiple Regression Estimates,Example 3.3 Participation in 401(k)Pension Plansmrate=the amount the firm contributes to a workers fund for each dollar the worker;prate=the percentage of eligible workers having a 401(k)account.,Section,Chapter,End,Decomposition of total variationR-squaredAlternative expression for R-squared,Notice that R-squared can only increase if another explanatoryvariable is added to the regression.This algebraic fact follows because,by definition,the sum of squared residuals never increases when additional regressors are added to the model.,R-squared is equal to the squaredcorrelation coefficient between theactual and the predicted value ofthe dependent variable,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.2.6 Goodness of Fit(1/3),Section,Chapter,End,Example:Explaining arrest recordsInterpretation:Proportion prior arrests+0.5!-.075=-7.5 arrests per 100 menMonths in prison+12!-.034(12)=-0.408 arrests for given manQuarters employed+1!-.104=-10.4 arrests per 100 men,Number of times arrested 1986,Proportion prior arreststhat led to conviction,Months in prison 1986,Quarters employed 1986,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.2.6 Goodness of Fit(2/3),Section,Chapter,End,Example:Explaining arrest records(cont.)An additional explanatory variable is added:Interpretation:Average prior sentence increases number of arrests(?)Limited additional explanatory power as R-squared increases by littleGeneral remark on R-squaredEven if R-squared is small(as in the given example),regression may still provide good estimates of ceteris paribus effects,Average sentence in prior convictions,R-squared increases only slightly,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.2.6 Goodness of Fit(3/3),Section,Chapter,End,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.2.7 Regression through the Origin,The decomposition of the total variation in y usually does not hold.R2 might be negative.some economists propose to calculate R2 as the squared correlation coefficient between the actual and fitted values of y.The cost of estimating an intercept when it is truly zero.,Section,Chapter,End,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.3 The Expected Value of the OLS Estimators,3.3.2 Including Irrelevant Variables,3.3.3 Omitted Variable Bias,3.3.1 Assumptions and Unbiasedness of OLS,Chapter,End,Standard assumptions for the multiple regression modelAssumption MLR.1(Linear in parameters)Assumption MLR.2(Random sampling),In the population,the relation-ship between y and the expla-natory variables is linear,so is between y and disturbance.,The data is a random sample drawn from the population,Each data point therefore follows the population equation,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.3.1 Assumptions and Unbiasedness of OLS(1/6),Section,Chapter,End,Standard assumptions for the multiple regression model(cont.)Assumption MLR.3(No perfect collinearity)Remarks on MLR.3The assumption only rules out perfect collinearity/correlation bet-ween explanatory variables;imperfect correlation is allowedIf an explanatory variable is a perfect linear combination of other explanatory variables it is superfluous and may be eliminatedConstant variables are also ruled out(collinear with intercept)nk+1,In the sample(and therefore in the population),noneof the independent variables is constant and there areno exact relationships among the independent variables“,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.3.1 Assumptions and Unbiasedness of OLS(2/6),Section,Chapter,End,Example for perfect collinearity:small sampleExample for perfect collinearity:relationships between regressors,In a small sample,avginc may accidentally be an exact multiple of expend;it will notbe possible to disentangle their separate effects because there is exact covariation,Either shareA or shareB will have to be dropped from the regression because thereis an exact linear relationship between them:shareA+shareB=1,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.3.1 Assumptions and Unbiasedness of OLS(3/6),Section,Chapter,End,Standard assumptions for the multiple regression model(cont.)Assumption MLR.4(Zero conditional mean)In a multiple regression model,the zero conditional mean assumption is much more likely to hold because fewer things end up in the errorExample:Average test scores,The value of the explanatory variables must contain no information about the mean of the unobserved factors,If avginc was not included in the regression,it would end up in the error term;it would then be hard to defend that expend is uncorrelated with the error,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.3.1 Assumptions and Unbiasedness of OLS(4/6),Section,Chapter,End,Discussion of the zero mean conditional assumptioncov(u,xj)=0,j=1,kFunctional form misspecification,omitted variables,measurement error,and simultaneous equations can cause cov(u,xj)0.Explanatory variables that are correlated with the error term are called endogenous;endogeneity is a violation of assumption MLR.4Explanatory variables that are uncorrelated with the error term are called exogenous;MLR.4 holds if all explanat.var.are exogenousExogeneity is the key assumption for a causal interpretation of the regression,and for unbiasedness of the OLS estimators,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.3.1 Assumptions and Unbiasedness of OLS(5/6),Section,Chapter,End,Theorem 3.1(Unbiasedness of OLS)Unbiasedness is an average property in repeated samples;in a given sample,the estimates may still be far away from the true valuesPROOF:,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.3.1 Assumptions and Unbiasedness of OLS(6/6),Section,Chapter,End,Including irrelevant variables in a regression model,=0 in the population,No problem because.,However,including irrevelant variables may increase sampling variance.,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.3.2 Including Irrelevant Variables,Section,Chapter,End,Omitting relevant variables:the simple caseOmitted variable biasConclusion:All estimated coefficients will be biased,If x1 and x2 are correlated,assume a linear regression relationship between them,If y is only regressed on x1 this will be the estimated intercept,If y is only regressed on x1,this will be the estimated slope on x1,error term,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.3.3 Omitted Variable Bias(1/6),True model(contains x1 and x2),Estimated model(x2 is omitted),Section,Chapter,End,Example:Omitting ability in a wage equationWhen is there no omitted variable bias?If the omitted variable is irrelevant or uncorrelated,Will both be positive,The return to education will be overestimated because.It will look as if people with many years of education earn very high wages,but this is partly due to the fact that people with more education are also more able on average.,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.3.3 Omitted Variable Bias(2/6),Section,Chapter,End,Omitted variable bias:more general casesNo general statements possible about direction of biasAnalysis as in simple case if one regressor uncorrelated with othersExample:Omitting ability in a wage equation,True model(contains x1,x2 and x3),Estimated model(x3 is omitted),If exper is approximately uncorrelated with educ and abil,then the direction of the omitted variable bias can be as analyzed in the simple two variable case.,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.3.3 Omitted Variable Bias(3/6),Section,Chapter,End,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.3.3 Omitted Variable Bias(4/6),Section,Chapter,End,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.3.3 Omitted Variable Bias(5/6),Section,Chapter,End,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.3.3 Omitted Variable Bias(6/6),Section,Chapter,End,CHAPTER 3 Multiple RegressionAnalysis:Estimation,3.4 The Variance of the OLS Estimators,3.4.1 The Components of the OLS Variances:Multicollinearity,3.4.2 Variances in Misspecified Models,Theorem 3.2 Sampling Variances of the O