多元回归分析:深入专题.ppt
1,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Multiple Regression Analysis:Further Issues,y=b0+b1x1+b2x2+.bkxk+u,2,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,6.1 数据的测度单位对OLS统计量的影响,Changing the scale of the y variable will lead to a corresponding change in the scale of the coefficients and standard errors,so no change in the significance or interpretationChanging the scale of one x variable will lead to a change in the scale of that coefficient and standard error,so no change in the significance or interpretation,3,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,4,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,5,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,6,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,7,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,因变量或自变量以对数形式出现,改变度量单位不会影响斜率系数,只会改变截距项。,8,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Beta Coefficients,Occasional youll see reference to a“standardized coefficient”or“beta coefficient”which has a specific meaningIdea is to replace y and each x variable with a standardized version i.e.subtract mean and divide by standard deviationCoefficient reflects standard deviation of y for a one standard deviation change in x,9,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,10,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Beta Coefficients(cont),11,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Example 6.1:Effects of Pollution on Housing Prices(数据名:HPRICE2),Stata 命令语句:reg price nox crime rooms dist stradio,beta(标准化后的回归分析),12,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,6.2 对函数形式的进一步讨论,OLS can be used for relationships that are not strictly linear in x and y by using nonlinear functions of x and y will still be linear in the parameters Can take the natural log of x,y or both Can use quadratic forms of x Can use interactions of x variables,13,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Interpretation of Log Models,ln(y)=b0+b1ln(x)+u b1 is the elasticity of y with respect to xln(y)=b0+b1x+u b1 is approximately the percentage change in y given a 1 unit change in x y=b0+b1ln(x)+u b1 is approximately the change in y for a 100 percent change in x,14,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Why use log models?,1.使用自然对数使得对系数的解释颇具有吸引力,可以直接以弹性的形式体现出来。2.斜率系数不随测量单位的变化而变化。3.取对数后,即使不能消除异方差的影响,但可以使之有所缓解。4.取对数通常会缩小变量的取值范围,在某些情况下还相当可观。5.缺点:变量不能取零和负值;更难预测原变量的值,原模型使我们预测log(y),而不是y。,15,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Quadratic Models(二次式模型),For a model of the form y=b0+b1x+b2x2+u we cant interpret b1 alone as measuring the change in y with respect to x,we need to take into account b2 as well,since,16,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,More on Quadratic Models,Suppose that the coefficient on x is positive and the coefficient on x2 is negative Then y is increasing in x at first,but will eventually turn around and be decreasing in x,17,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,More on Quadratic Models,Suppose that the coefficient on x is negative and the coefficient on x2 is positive Then y is decreasing in x at first,but will eventually turn around and be increasing in x,18,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,How to describe decreasing effect,19,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,How to describe increasing effect,二次项模型案例:污染对住房价格的影响(数据名:HPRICE2),20,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,(0.57)(0.115)(0.043)(0.165)(0.013)(0.006)N=506,R2=0.603,在room*=0.545/(2*0.062)4.4的右边,增加一个卧室对价格的百分比变化具有递增的影响。,比如,rooms从5增加到6会导致价格提高约为-54.5+12.4*5=7.5%;rooms从6增加到7会导致价格提高约为-54.5+12.4*6=19.9%。这是一个很强的递增影响。STATA命令语句:gen rooms2=rooms*rooms gen ldist=log(dist)reg lprice lnox ldist rooms rooms2 stratiodisplay-1*_brooms/(2*_brooms2)(求转折点)display 100*(_brooms+2*_brooms2*6(求rooms从6增加到7会导致价格提高的百分比)。,21,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,22,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Interaction Terms(有交互作用项的模型),For a model of the form y=b0+b1x1+b2x2+b3x1x2+u we cant interpret b1 alone as measuring the change in y with respect to x1,we need to take into account b3 as well,since,交互效应通常需将模型重新参数化:原模型:,23,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,是x2=0时,X2对y的偏效应,这通常没有什么意义,我们转而将模型重新参数化为:,其中,和 分别是x1和x2的总体均值。很容易计算出:我们立即得到在均值的偏效应。,24,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Example:Effects of Attendance on Final Exam Performance(数据名:ATTEND.DTA)atndrte系数为负,是否意味着听课对期末考试分数具有负面影响?b1仅考虑了priGPA=0时的影响。atndrte和priGPAatndrte系数估计值t值不显著,是否意味着两者对期末考试分数无影响?F检验的p值为0.014.,25,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Atndrte对stndfnl的偏效应:其含义是:在priGPA的平均水平(2.59)上,atndrte提高10个百分点,使stndfnl比期末考试平均分数高出0.078倍。,STATA命令语句:,sum priGPAgen priGPA2=priGPA*priGPAgen ACT2=ACT*ACTgen priatn=priGPA*atndrtereg stndfnl atndrte priGPA priGPA2 ACT2 priatn,26,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,27,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,6.3 拟合优度和回归元选择的进一步探讨,Recall that the R2 will always increase as more variables are added to the modelThe adjusted R2 takes into account the number of variables in a model,and may decrease,28,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,调整R方的作用:为在一个模型中另外增加自变量施加了惩罚。随着一个新的自变量加入回归方程,SSR下降,但回归中的自由度df=n-k-1也下降。因此,SSR/(n-k-1)可能上升,也可能下降。作为一个结论有:在回归中增加一个新变量,当且仅当新变量的t统计量在绝对值上大于1,调整R方才会有所提高;在回归中增加一组变量时,当且仅当这组新变量联合显著性的F统计量大于1,调整R方才会有所提高。,29,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Adjusted R-Squared(cont),Its easy to see that the adjusted R2 is just(1 R2)(n 1)/(n k 1),but most packages will give you both R2 and adj-R2 You can compare the fit of 2 models(with the same y)by comparing the adj-R2 You cannot use the adj-R2 to compare models with different ys(e.g.y vs.ln(y),30,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Using adjusted R-squared to choose between nonnested models.,31,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,32,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Controlling too many factors in regression analysis回归分析中控制了过多的因素,Important not to fixate too much on adj-R2 and lose sight of theory and common senseIf economic theory clearly predicts a variable belongs,generally leave it inDont want to include a variable that prohibits a sensible interpretation of the variable of interest remember ceteris paribus interpretation of multiple regression,33,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,在研究啤酒税对交通死亡率影响的回归模型中,是否应该将人均啤酒消费量变量包括在模型之中?在保持beercons不变的情况下,死亡率因tax提高 1个百分点而导致的差异。这一说法是否有意义?,34,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Adding regressors to reduce the error of variance-增加回归元以减少误差方差,在回归中增加一个新的自变量会加剧多重共线性问题;另一方面,从误差项中取出一些因素作为解释变量可以减少误差方差。应该将那些影响y而又与所有我们关心的自变量都无关的自变量包括进来。,35,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,6.4 预测和残差分析,Suppose we want to use our estimates to obtain a specific prediction.First,suppose that we want an estimate of E(y|x1=c1,xk=ck)=q0=b0+b1c1+bkck This is easy to obtain by substituting the xs in our estimated model with cs,but what about a standard error?Really just a test of a linear combination,36,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Predictions(cont),Can rewrite as b0=q0 b1c1 bkck Substitute in to obtain y=q0+b1(x1-c1)+bk(xk-ck)+u So,if you regress yi on(xij-cij)the intercept will give the predicted value and its standard error Note that the standard error will be smallest when the cs equal the means of the xs,37,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Example:CI for predicted college GPA,38,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Predictions(cont),This standard error for the expected value is not the same as a standard error for an new outcome on y We need to also take into account the variance in the unobserved error.,39,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,40,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,41,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,42,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Prediction interval,Usually the estimate of s2 is much larger than the variance of the prediction,thusThis prediction interval will be a lot wider than the simple confidence interval for the prediction,43,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Residual Analysis,Information can be obtained from looking at the residuals(i.e.predicted vs.observed)Example:Regress price of cars on characteristics big negative residuals indicate a good dealExample:Regress average earnings for students from a school on student characteristics big positive residuals indicate greatest value-added,44,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,例如,HPRICE1.RAW的住房价格模型中。,45,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,46,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Predicting y in a log model,Simple exponentiation of the predicted ln(y)will underestimate the expected value of yInstead need to scale this up by an estimate of the expected value of exp(u),47,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,48,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,49,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,例6.7 对CEO薪水的预测,50,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,Comparing log and level models,A by-product of the previous procedure is a method to compare a model in logs with one in levels.Take the fitted values from the auxiliary regression,and find the sample correlation between this and y Compare the R2 from the levels regression with this correlation squared,51,Copyright 2007 Thomson Asia Pte.Ltd.All rights reserved.,例6.8 对CEO薪水的预测,