工具变量与两阶段最小二乘法ppt课件.ppt
Intermediate Econometrics, Yan Shen,1,Instrumental Variables & 2SLS工具变量与两阶段最小二乘法,y = b0 + b1x1 + b2x2 + . . . bkxk + u x1 = p0 + p1z + p2x2 + . . . pkxk + v,Intermediate Econometrics, Yan Shen,2,Chapter Outline本章提要,Omitted Variables in a simple regression model简单回归中的遗漏变量IV estimation of the Multiple Regression多方程回归中的工具变量估计Two Stage Least Squares两阶段最小二乘法IV solutions to errors-in-variables problem用工具变量解决测量误差问题Testing for endogeneity检验内生性,Intermediate Econometrics, Yan Shen,3,Lecture Outline本课提要,Motivation: Why using IV?出发点:为何用工具变量?Statistical Inference with the IV estimatorIV 估计中的统计推断Properties of IV with a poor IV“坏”工具变量的性质Computing R squares after IV计算IV估计的R方IV estimation of the multiple regression model多方程回归的IV估计,Intermediate Econometrics, Yan Shen,4,Problem to start with从这个问题出发,If important variables are omitted, what should we do?如果一些重要的变量被遗漏,我们应当怎么办?,Intermediate Econometrics, Yan Shen,5,The ways out一些办法,Ignore the problem, pretend that it does not exist忽略这个问题,假装这个问题并不存在Find and use a suitable proxy使用代理变量Uses an estimation method that recognizes the presence of the omitted variable使用一种对遗漏变量稳健的估计方法。,Intermediate Econometrics, Yan Shen,6,Why Use Instrumental Variables?为何使用工具变量?,Instrumental Variables (IV) estimation is used when your model has endogenous xs当模型解释变量具有内生性时,使用工具变量估计 That is, when Cov(x,u) 0 即,Cov(x,u) 0时,Intermediate Econometrics, Yan Shen,7,Why Use Instrumental Variables?为何使用工具变量?,Thus, IV can be used to address the problem of omitted variable bias所以,IV可以用来解决遗漏变量偏差 Additionally, IV can be used to solve the classic errors-in-variables problem而且,IV可用来解决经典的测量误差问题,Intermediate Econometrics, Yan Shen,8,Instrumental Variable: Who qualifies?什么样的变量可以作为IV?,In order for a variable, z, to serve as a valid instrument for x, the following must be true针对内生变量 x 的一个有效的工具变量 z 应当满足如下条件 The instrument must be exogenous工具变量应为外生 That is, Cov(z,u) = 0 (15.4)即Cov(z,u) = 0,Intermediate Econometrics, Yan Shen,9,The instrument must be correlated with the endogenous variable x工具变量应与内生变量 x 相关 That is, Cov(z,x) 0 (15.5),Instrumental Variable: Who qualifies?什么样的变量可以作为IV?,Intermediate Econometrics, Yan Shen,10,About Cov(z,u) 关于Cov(z,u),We have to use common sense and economic theory to decide if it makes sense to assume Cov(z,u) = 0为了判断Cov(z,u) = 0这一假定是否合理,我们不得不 依赖于常识和经济理论。,Intermediate Econometrics, Yan Shen,11,About Cov(z,x),We can test if Cov(z,x) 0 我们可以检验是否Cov(z,x) 0 Just testing H0: p1 = 0 in x = p0 + p1z + v只需检验 H0: p1 = 0 in x = p0 + p1z + v Sometimes we refer to this regression as the first-stage regression.有时我们将这个回归称为第一阶段回归。,Intermediate Econometrics, Yan Shen,12,Example: wage determination例子:工资决定,Suppose the true model regresses log(wage) on education (educ) and ability (abil).假定真实模型将对数工资对教育和能力回归Now ability is unobserved, and the proxy, IQ, is not available.现在能力不可观测,而且没有代理变量IQThe actual regression: regress log(wage) on educ.事实上使用的回归:将对数工资对教育回归,Intermediate Econometrics, Yan Shen,13,Example: wage determination例子:工资决定,Problem: since the error term contains IQ, and education correlates with IQ, endogeneity problem appears.问题:由于误差项包含IQ,并且教育水平与能力相关,此时会出现教育的内生性问题。A good IV need to be highly correlated with education, but not correlated with the error term.一个好的IV应当与教育水平高度相关,并且与误差项不相关。,Intermediate Econometrics, Yan Shen,14,Example: wage determination例子:工资决定,Is IQ a good instrument?IQ是好的工具变量吗?No. It correlates with both education and the error term.不。它同时与教育和误差项相关。,Intermediate Econometrics, Yan Shen,15,Example: wage determination例子:工资决定,IV used in the literature:在文献中使用的IVMothers education母亲教育水平Number of siblings. Hypothesis: more siblings is associated with lower average levels of education. 兄弟姐妹数目。假说:兄弟姐妹越多,平均受教育水平越低If we with to use either of them as IV, we need to be confident that they are not correlated with ability.无论我们使用其中的哪一个作为IV,我们都需要肯定它们是与能力不相关的。,Intermediate Econometrics, Yan Shen,16,When an IV is Available: Estimation当IV存在时:估计,For y = b0 + b1x + u, and given our assumptions (15.4) and (15.5), b1 can be identified.对于y = b0 + b1x + u ,且给定假定(15.4) 及(15.5), b1可以被识别In this context, identification means that we can write b1 in terms of population moments that can be estimated in samples.这里,识别 是指我们可以将b1表示为总体矩的函数,并且这些矩可以通过样本估计。,Intermediate Econometrics, Yan Shen,17,When an IV is Available: Estimation当IV存在时:估计,Since Cov(z,y) = b1Cov(z,x) + Cov(z,u), sob1 = Cov(z,y) / Cov(z,x) Then the IV estimator for b1 is则b1的工具变量估计为,Intermediate Econometrics, Yan Shen,18,When an IV is Available: Estimation当IV存在时:估计,When z=x we obtain the OLS estimator of b1 .当z=x时,我们得到b1的OLS估计 This means when x is exogenous, it can be used as its own IV, and the IV estimator is identical to OLS in this case.这意味着当x是外生时,可以用它作自己的IV,这时的IV估计与OLS估计恒等。,Intermediate Econometrics, Yan Shen,19,When an IV is Available: Estimation当IV存在时:估计,When assumptions (15.4) and (15.5) hold, one can show that the IV estimator is consistent for b1, after applying the law of large numbers.当假定(15.4) 和(15.5) 成立时,可以应用大数定律证明IV估计是b1的一致估计。,Intermediate Econometrics, Yan Shen,20,When an IV is Available: Inference当IV存在时:推断,The homoskedasticity assumption is 同方差假定:E(u2|z) = s2 = Var(u) (15.11) When assumptions (15.4), (15.5), (15.11) hold, given the asymptotic variance, the standard error for b1 can be estimated.当假定(15.4), (15.5), (15.11) 成立时,给定渐近方差, 可以估计b1的标准差,Intermediate Econometrics, Yan Shen,21,When an IV is Available: Inference当IV存在时:推断,Intermediate Econometrics, Yan Shen,22,When an IV is Available: Inference当IV存在时:推断,Intermediate Econometrics, Yan Shen,23,IV versus OLS estimationIV与OLS估计,Standard error in IV case differs from OLS only in the Rx,z2 from regressing x on zIV与OLS的标准差的不同之处仅在于将x对z回归得到的Rx,z2 Since Rx,z2 1, IV standard errors are larger由于Rx,z2 1,IV的标准差会比较大。,Intermediate Econometrics, Yan Shen,24,IV versus OLS estimationIV与OLS估计,When Cov(x,u) 0, OLS is inconsistent. IV is consistent when assumptions (15.4), (15.5) holds.当Cov(x,u) 0 ,OLS不是一致估计,当(15.4), (15.5) 成立时,IV是一致估计。 The stronger the correlation between z and x, the smaller the IV standard errorsx和z的相关性越强,IV的标准差越小。,Intermediate Econometrics, Yan Shen,25,The Effect of Poor Instruments不好的工具变量导致的问题,What if our assumption that Cov(z,u) = 0 is false?如果Cov(z,u) = 0 不成立会怎样? The IV estimator will be inconsistent, tooIV估计将不再为一致估计 Can compare asymptotic bias in OLS and IV可以对比OLS和IV的渐近偏差,Intermediate Econometrics, Yan Shen,26,The Effect of Poor Instruments不好的工具变量导致的问题,Prefer IV if Corr(z,u)/Corr(z,x) Corr(x,u),Intermediate Econometrics, Yan Shen,27,Reporting R-Squared after IV Estimation在IV估计后报告R方,Since the variance of y cannot be decomposed nicely when x and u are correlated, the R-squared is not very meaningful after the IV estimation.由于当x与u相关时我们不能对y进行简单的分解,在IV估计之后的R方没有太多意义。It cannot be used in the usual way to compute F tests of joint restrictions.也不能用它来计算检验线性约束的 F 统计量。,Intermediate Econometrics, Yan Shen,28,IV Estimation in the Multiple Regression Case多元回归的工具变量估计,IV estimation can be extended to the multiple regression caseIV估计可以扩展到多元回归情形 Call the model we are interested in estimating the structural model我们将感兴趣的模型称为结构方程。 Our problem is that one or more of the variables are endogenous问题在于一个或多个变量是内生的。 We need an instrument for each endogenous variable对每个内生变量都我们都需要一个工具变量。,Intermediate Econometrics, Yan Shen,29,The Structural Equation结构方程,Intermediate Econometrics, Yan Shen,30,The Structural Equation: Example结构方程:例子,Intermediate Econometrics, Yan Shen,31,The Reduced-Form Equation方程的简约式,The reduced-form equation expresses one of the endogenous variables as a function of all exogenous variables and stochastic disturbances.简约式的方程将内生变量表示为所有外生变量和随机扰动的函数。The corresponding parameters are reduced-form parameters.对应的参数称为简约式的参数。,Intermediate Econometrics, Yan Shen,32,The Reduced-Form Equation:Example简约式的方程:例子,Intermediate Econometrics, Yan Shen,33,Multiple Regression IV多元回归的IV,Now, we write the structural model as y1 = b0 + b1y2 + b2 x1 + u1, where y2 is endogenous and x1 is exogenous现在,将结构模型写为y1 = b0 + b1y2 + b2 x1 + u1,其中y2为内生,为x1外生 Let z1 be the instrument for y2, we need the following assumptions to get a valid IV estimates. 令z1为y2的工具变量,为了得到一个有效的IV,我们需要做如下假设,Intermediate Econometrics, Yan Shen,34,Assumptions for Multiple Regression IV多个回归中关于IV的假定,E(u1) = 0 Cov(x1,u1) = 0 and Cov(z1,u1) = 0 Therefore, E(z1u1 )= E(x1u1 )=0.We now have 3 moment conditions for 3 unknowns.现在我们有三个未知量,三个矩条件,Intermediate Econometrics, Yan Shen,35,Obtaining the IV estimators得到IV估计量,Intermediate Econometrics, Yan Shen,36,The Reduced form for y2 y2的简约式,We still need z2 to be correlated with y2我们仍然需要 z2与 y2相关y2 = p0 + p1x1 + p2z1 + v2, where p2 0 This reduced form equation regresses the endogenous variable on all exogenous ones.这个简约式方程将内生变量对所有的外生变量回归。The above procedure can be extended to more explanatory variables case.上述过程可以扩展为更多个解释变量的情形。,Intermediate Econometrics, Yan Shen,37,Example: Return to Education例子:教育回报,Card (1995) use wage and education data for a sample of men in 1976 to estimate the return to education.Card(1995)使用1976年关于men的工资和教育数据估计教育回报Instrumental variable: whether one grew up near a four-year college (nearc4).工具变量:是否在本科大学附近长大。,Intermediate Econometrics, Yan Shen,38,Example: Return to Education例子:教育回报,For distance to be a valid instrument如果距离是一个有效的IV,应有(1) It must be uncorrelated with the error term that determines wage.(1)它必须和影响工资的误差项不相关,Intermediate Econometrics, Yan Shen,39,Example: Return to Education例子:教育回报,For distance to be a valid instrument如果距离是一个有效的IV,应有(2) It must be partially correlated with educ.(2)它必须部分地与教育变量相关This can be checked by regress educ on nearc4 and all of the exogenous variables appearing in the equation.可以通过将教育对距离和其他外生变量回归来查看这一点。,Intermediate Econometrics, Yan Shen,40,Example: Return to Education例子:教育回报,This means we estimate the reduced form for educ.这意味着我们估计了教育的简约式的方程This estimation gives a coefficient of 0.32 with a standard error of 0.088.估计的结果:系数为0.32,标准差为0.088Implies that in 1976, other things being fixed, on average, people who lived near a college had about one-third of a year more education than those who didnt. 意味着在1976年,控制住其它因素后,平均来讲,居住在大学附近的人比其他人多受约4个月的教育,Intermediate Econometrics, Yan Shen,41,Example: Return to Education例子:教育回报,Intermediate Econometrics, Yan Shen,42,Example: Return to Education例子:教育回报,The IV estimates of the return to education is almost twice as large as the OLS estimate.对教育回报的IV估计几乎是OLS估计的两倍The standard error is over 18 times larger.标准差比OLS估计大18倍,Intermediate Econometrics, Yan Shen,43,Example: Return to Education例子:教育回报,Large confidence intervals is a price we must pay to get a consistent estimator of the return to education when we think educ is endogenous.如果我们认为教育是内生的,而且想要得到一致的估计,过宽的置信区间将是我们所要付出的代价。Because OLS minimizes the sum of squared residuals, it is not surprising to observe a smaller R square for the IV estimation.由于OLS最小化残差的平方和,所以观察到IV的R方较小并不奇怪。,Intermediate Econometrics, Yan Shen,44,Two Stage Least Squares (2SLS)量阶段最小二乘法,Its possible to have multiple instruments有多个IV也是可能的 Consider our original structural model, and let考虑我们原来的结构模型,令 y2 = p0 + p1z1 + p2z2 + p3z3 + v2 Here were assuming that both z2 and z3 are valid instruments they do not appear in the structural model and are uncorrelated with the structural error term, u1这里我们假定z2 , z3都是有效的工具变量它们不出现在结构模型里,而且与结构误差项u1不相关,Intermediate Econometrics, Yan Shen,45,Best Instrument最好的工具变量,Could use either z2 or z3 as an instrument可以用 z2或 z3作为工具变量 The best instrument is a linear combination of all of the exogenous variables, y2* = p0 + p1z1 + p2z2 + p3z3 最好的工具变量是所有外生变量的线性组合 We can estimate y2* by regressing y2 on z1, z2 and z3 can call this the first stage我们可以通过将 y2对 z1, z2 , z3回归得到y2* ,我们将此步骤称为第一阶段 If then substitute 2 for y2 in the structural model, get same coefficient as IV在结构方程中用 2 替换为y2得到与IV相同估计系数,Intermediate Econometrics, Yan Shen,46,More on 2SLS,While the coefficients are the same, the standard errors from doing 2SLS by hand are incorrect, so let Stata do it for you虽然系数相同,上述方法得到的标准差是不对的,所以STATA会帮你计算 Method extends to multiple endogenous variables need to be sure that we have at least as many excluded exogenous variables (instruments) as there are endogenous variables in the structural equation这个办法可以扩展到多个内生变量的情形,要注意我们需要的工具变量的数目至少等于结构方程中内生变量的数目。,