多元回归分析.ppt
《多元回归分析.ppt》由会员分享,可在线阅读,更多相关《多元回归分析.ppt(70页珍藏版)》请在三一办公上搜索。
1、多元迴歸分析,遺漏變數偏誤多元迴歸模型多元迴歸模型的估計多元迴歸模型:實例變異數分析與參數檢定多元迴歸模型的幾個重要議題,遺漏變數偏誤,我們將不再假設解釋變數為固定值,而是隨機變數在簡單迴歸模型中,只有一個解釋變數,然而,在大多數的情形下,被解釋變數Y 通常可被一個以上的變數所解釋。舉例來說,所得水準除了受到教育程度的影響之外,亦可能受到工作經驗等其他變數所影響,遺漏變數偏誤,此外,只考慮一個解釋變數時,可能會產生遺漏變數偏誤(omitted variable bias)考慮解釋變數(如教育程度)與另外一個變數(如父母所得水準)具相關性,(一般來說,父母所得越高,子女能夠得到的教育越好,教育程
2、度自然越高)且該變數(父母所得水準)本身亦會直接影響被解釋變數(所得水準),(一般來說,父母所得越高,投注在子女身上的其他資源越多,子女的所得也因而越高),遺漏變數偏誤,如果我們在迴歸模型中忽略了此變數,就會造成遺漏變數偏誤假設原有解釋變數為X,遺漏變數為Z,而被解釋變數為Y。換句話說,一個變數是否為迴歸模型中的遺漏變數,必須符合以下兩條件:該變數與模型原有的解釋變數相關:Corr(X,Z)0。該變數 Z 亦會直接影響被解釋變數Y。,Suppose the true model isThe estimated model isThe covariance between Xi and erro
3、r term is,6,Therefore,Since 0,we have,7,遺漏變數的影響,遺漏變數偏誤不會隨樣本增加而變小簡言之,如果我們忽略了遺漏變數,將使原有的解釋變數的估計式 不是參數 的一致估計式遺漏變數偏誤決定於|Cov(X,Z)|的大小若Cov(X,Z)0,則存在正向偏誤(高估欲估計的參數);反之,若Cov(X,Z)0,則存在負向偏誤(低估欲估計的參數)。,An example of omitted variable bias:,Mozart Effect?Listening to Mozart for 10-15 minutes could raise IQ by 8 or
4、 9 points.(Nature 1993)Students who take optional music or arts courses in high school have higher English and math test scores than those who dont.,9,多元迴歸模型,我們將只考慮一個解釋變數的簡單迴歸模型擴充為如下的多元迴歸模型:其中,X=X1,.,Xk 就是模型中的k 個解釋變數,ei 為隨機干擾項,且,y,x1,b,0,Response,Plane,(Observed y),e,i,Population Multiple Regression
5、 Model,Bivariate model:,x2,(x1i,x2i),是未知參數,其意義為亦即在控制其他變數影響之情況下,第j 個解釋變數對於Y 的淨影響,多元迴歸模型:薪資所得,教育程度與工作經驗,多元迴歸模型為薪資所得=0+1教育程度+2工作經驗+ei,簡單迴歸模型為薪資所得=+教育程度+ei,可以確定的是,1 與 都是用來探討教育程度對於薪資所得的影響,但是1 與 的詮釋卻不相同,單純地衡量教育程度如何影響薪資所得,亦即,教育程度增加一單位(譬如說增加一年),薪資所得將增加 單位然而,我們知道影響薪資所得的解釋變數應該不只一個,因此,一旦我們將其他可能的解釋變數考慮進來(本例中的工作
6、經驗),則1 詮釋為:在給定相同的工作經驗下,教育程度增加一單位,薪資所得將增加1 單位,多元迴歸模型,這就是在經濟學的研究中,我們時常探討所謂的其他情況不變下(ceteris paribus),變數之間的關係譬如說,其他情況不變下,價格如何影響需求量。或者是,其他情況不變下,工資率如何影響勞動供給,多元迴歸模型的估計,欲估計迴歸模型中的未知參數,我們知道 相互獨立,最小平方法為,多元迴歸模型的估計,因此,尋找 來極大透過我們可以得到k+1 條標準方程式,進而解出許多商業軟體如EXCEL 都能夠輕易地幫你找出這些估計值,Estimation of 2,For a model with k in
7、dependent variables,多元迴歸模型:實例,阿中為一物流送貨員,時常在外奔波運送貨品。阿中的老板懷疑阿中利用在外送貨的空檔開小差,因此,阿中的老板將他以前的送貨行程記錄調出,根據多元迴歸模型:其中,Y=在外奔波時數,X1=送貨路程,而X2=送貨點個數阿中的老板估計出如下的迴歸模型,在固定的送貨點個數下,阿中的送貨路程每多一公里,在外奔波時數增加0.066 小時;在相同的送貨路程下,阿中的送貨點每多一個,在外奔波時數增加0.694 小時其中,在本例中,以及,根據自由度為n(k+1)=10(2+1)=7的t 分配,在顯著水準=1%,5%以及10%的臨界值分別為3.499,2.365
8、 以及1.895因此,在1%的顯著水準下具顯著性,而 則是在10%的顯著水準下具顯著性送貨路程與送貨點個數無論是在經濟上或是統計上均具顯著性亦即,都是在外奔波時數的重要解釋變數,在得到以上的估計後,阿中的老板一旦知道阿中今天有5 個送貨點得跑,總路程為110 公里,則阿中的老板可以預測阿中今天在外奔波時數為0.39+0.066 110+0.694 5=10.35 小時如果阿中今天在外奔波了12 個小時,則阿中的老板就能夠合理地懷疑阿中利用2 小時開小差這個例子清楚地說明迴歸模型的兩大重要功能:解釋與預測,23.1 The Multiple Regression Model,A chain is
9、 considering where to locate a new restaurant.Is it better to locate it far from the competition or in a more affluent area?Use multiple regression to describe the relationship between several explanatory variables and the response.Multiple regression separates the effects of each explanatory variab
10、le on the response and reveals which really matter.,Copyright 2011 Pearson Education,Inc.,3 of 47,23.2 Interpreting Multiple Regression,Example:Womens Apparel StoresResponse variable:sales at stores in a chain of womens apparel(annually in dollars per square foot of retail space).Two explanatory var
11、iables:median household income in the area(thousands of dollars)and number of competing apparel stores in the same mall.,Copyright 2011 Pearson Education,Inc.,7 of 47,23.2 Interpreting Multiple Regression,Example:Womens Apparel StoresBegin with a scatterplot matrix,a table of scatterplots arranged a
12、s in a correlation matrix.Using a scatterplot matrix to understand data can save considerable time later when interpreting the multiple regression results.,Copyright 2011 Pearson Education,Inc.,8 of 47,23.2 Interpreting Multiple Regression,Scatterplot Matrix:Womens Apparel Stores,Copyright 2011 Pear
13、son Education,Inc.,9 of 47,23.2 Interpreting Multiple Regression,Example:Womens Apparel StoresThe scatterplot matrix for this example Confirms a positive linear association between sales and median household income.Shows a weak association between sales and number of competitors.,Copyright 2011 Pear
14、son Education,Inc.,10 of 47,23.2 Interpreting Multiple Regression,Correlation Matrix:Womens Apparel Stores,Copyright 2011 Pearson Education,Inc.,11 of 47,23.2 Interpreting Multiple Regression,Partial Slopes:Womens Apparel Stores,Copyright 2011 Pearson Education,Inc.,16 of 47,23.2 Interpreting Multip
15、le Regression,Marginal and Partial SlopesPartial slope:slope of an explanatory variable in a multiple regression that statistically excludes the effects of other explanatory variables.Marginal slope:slope of an explanatory variable in a simple regression.,Copyright 2011 Pearson Education,Inc.,15 of
16、47,23.2 Interpreting Multiple Regression,Partial Slopes:Womens Apparel Stores,Copyright 2011 Pearson Education,Inc.,16 of 47,Inference in Multiple Regression,Inference for One CoefficientThe t-statistic is used to test each slope using the null hypothesis H0:j=0.The t-statistic is calculated as,Copy
17、right 2011 Pearson Education,Inc.,31 of 47,Inference in Multiple Regression,t-test Results for Womens Apparel StoresThe t-statistics and associated p-values indicate that both slopes are significantly different from zero.,Copyright 2011 Pearson Education,Inc.,32 of 47,Prediction IntervalsAn approxim
18、ate 95%prediction interval is given by.For example,the 95%prediction interval for sales per square foot at a location with median income of$70,000 and 3 competitors is approximately$545.47$136.06 per square foot.,Copyright 2011 Pearson Education,Inc.,33 of 47,Partial Slopes:Womens Apparel StoresThe
19、slope b1=7.966 for Income implies that a store in a location with a higher median household of$10,000 sells,on average,$79.66 more per square foot than a store in a less affluent location with the same number of competitors.The slope b2=-24.165 implies that,among stores in equally affluent locations
20、,each additional competitor lowers average sales by$24.165 per square foot.,Copyright 2011 Pearson Education,Inc.,17 of 47,Marginal and Partial SlopesPartial and marginal slopes only agree when the explanatory variables are uncorrelated.In this example they do not agree.For instance,the marginal slo
21、pe for Competitors is 4.6352.It is positive because more affluent locations tend to draw more competitors.The MRM separates these effects but the SRM does not.,Copyright 2011 Pearson Education,Inc.,18 of 47,Checking Conditions,Conditions for InferenceUse the residuals from the fitted MRM to check th
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 多元 回归 分析

链接地址:https://www.31ppt.com/p-5146554.html