欢迎来到三一办公! | 帮助中心 三一办公31ppt.com(应用文档模板下载平台)
三一办公
全部分类
  • 办公文档>
  • PPT模板>
  • 建筑/施工/环境>
  • 毕业设计>
  • 工程图纸>
  • 教育教学>
  • 素材源码>
  • 生活休闲>
  • 临时分类>
  • ImageVerifierCode 换一换
    首页 三一办公 > 资源分类 > DOC文档下载  

    统计建模与R软件课后习题答案25章.doc

    • 资源ID:4201534       资源大小:62.50KB        全文页数:22页
    • 资源格式: DOC        下载积分:8金币
    快捷下载 游客一键下载
    会员登录下载
    三方登录下载: 微信开放平台登录 QQ登录  
    下载资源需要8金币
    邮箱/手机:
    温馨提示:
    用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)
    支付方式: 支付宝    微信支付   
    验证码:   换一换

    加入VIP免费专享
     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    统计建模与R软件课后习题答案25章.doc

    第二章答案:Ex2.1x<-c(1,2,3)y<-c(4,5,6)e<-c(1,1,1)z=2*x+y+ez1=crossprod(x,y)#z1为x1与x2的内积或者 x%*%yz2=tcrossprod(x,y)#z1为x1与x2的外积或者 x%o%yz;z1;z2要点:基本的列表赋值方法,内积和外积概念。内积为标量,外积为矩阵。Ex2.2A<-matrix(1:20,c(4,5);AB<-matrix(1:20,nrow=4,byrow=TRUE);BC=A+B;C#不存在AB这种写法E=A*B;EF<-A1:3,1:3;FH<-matrix(c(1,2,4,5),nrow=1);H#H起过渡作用,不规则的数组下标G<-B,H;G要点:矩阵赋值方法。默认是byrow=FALSE,数据按列放置。 取出部分数据的方法。可以用数组作为数组的下标取出数组元素。 Ex2.3x<-c(rep(1,times=5),rep(2,times=3),rep(3,times=4),rep(4,times=2);x #或者省略times=,如下面的形式x<-c(rep(1,5),rep(2,3),rep(3,4),rep(4,2);x要点:rep()的使用方法。rep(a,b)即将a重复b次 Ex2.4n <- 5; H<-array(0,dim=c(n,n)for (i in 1:n)for (j in 1:n)Hi,j<-1/(i+j-1);HG <- solve(H);G #求H的逆矩阵ev <- eigen(H);ev #求H的特征值和特征向量要点:数组初始化;for循环的使用待解决:如何将很长的命令(如for循环)用几行打出来再执行?每次想换行的时候一按回车就执行了还没打完的命令.Ex2.5StudentData<-data.frame(name=c("zhangsan","lisi","wangwu","zhaoliu","dingyi"),sex=c("F","M","F","M","F"),age=c("14","15","16","14","15"),height=c("156","165","157","162","159"),weight=c("42","49","41.5","52","45.5");StudentData要点:数据框的使用待解决:SSH登陆linux服务器中文显示乱码。此处用英文代替。Ex2.6write.table(StudentData,file="studentdata.txt")#把数据框StudentData在工作目录里输出,输出的文件名为studentdata.txt.StudentData_a<-read.table("studentdata.txt");StudentData_a#以数据框的形式读取文档studentdata.txt,存入数据框StudentData_a中。write.csv(StudentData_a,"studentdata.csv")#把数据框StudentData_a在工作目录里输出,输出的文件名为studentdata.csv,可用Excel打开.要点:读写文件。read.table("file") write.table(Rdata,"file") read.csv("file") write.csv(Rdata,"file") 外部文件,不论是待读入或是要写出的,命令中都得加双引号。Ex2.7Fun<-function(n)if(n <= 0) list(fail="please input a integer above 0!")elserepeat if(n=1) break else if(n%2=0)n<-n/2 else n<- 3*n+1list("sucess!")在linux下新建一个R文件,输入上述代码,保存为"2.7.R"然后在当前目录下进入R环境,输入source("2.7.R"),即打开了这个程序脚本。然后就可以执行函数了。输入Fun(67),显示"sucess!"输入Fun(-1),显示$fail1 "please input a integer above 0!"待解决:source("*.R")是可以理解为载入这个R文件吧?如何在R环境下关闭R文件呢?Ex3.1新建txt文件如下:3.1.txt74.3 79.5 75.0 73.5 75.8 74.0 73.5 67.2 75.8 73.5 78.8 75.6 73.5 75.0 75.872.0 79.5 76.5 73.5 79.5 68.8 75.0 78.8 72.0 68.8 76.5 73.5 72.7 75.0 70.478.0 78.8 74.3 64.3 76.5 74.3 74.7 70.4 72.7 76.5 70.4 72.0 75.8 75.8 70.476.5 65.0 77.2 73.5 72.7 80.5 72.0 65.0 80.3 71.2 77.6 76.5 68.8 73.5 77.280.5 72.0 74.3 69.7 81.2 67.3 81.6 67.3 72.7 84.3 69.7 74.3 71.2 74.3 75.072.0 75.4 67.3 81.6 75.0 71.2 71.2 69.7 73.5 70.4 75.0 72.7 67.3 70.3 76.573.5 72.0 68.0 73.5 68.0 74.3 72.7 72.7 74.3 70.4编写一个函数(程序名为data_outline.R)描述样本的各种描述性统计量。data_outline<-function(x)n<-length(x)m<-mean(x)v<-var(x)s<-sd(x)me<-median(x)cv<-100*s/mcss<-sum(x-m)2)uss<-sum(x2)R <- max(x)-min(x)R1 <-quantile(x,3/4)-quantile(x,1/4)sm <-s/sqrt(n)g1 <-n/(n-1)*(n-2)*sum(x-m)3)/s3g2 <-(n*(n+1)/(n-1)*(n-2)*(n-3)*sum(x-m)4)/s4-(3*(n-1)2)/(n-2)*(n-3)data.frame(N=n,Mean=m,Var=v,std_dev=s,Median=me,std_mean=sm,CV=cv,CSS=css,USS=uss,R=R,R1=R1,Skewness=g1,Kurtosis=g2,row.names=1)进入R,source("data_outline.R") #将程序调入内存serumdata<-scan("3.1.txt");serumdata #将数据读入向量serumdata。data_outline(serumdata)结果如下: N Mean Var std_dev Median std_mean CV CSS USS R1 100 73.696 15.41675 3.926417 73.5 0.3926417 5.327857 1526.258 544636.3 20 R1 Skewness Kurtosis1 4.6 0.03854249 0.07051809要点:read.table()用于读表格形式的文件。上述形式的数据由于第七行缺几个数据,故用read.table()不能读入。 scan()可以直接读纯文本文件。scan()和matrix()连用还可以将数据存放成矩阵形式。 X<-matrix(scan("3.1.txt",0),ncol=10,byrow=TRUE) #将上述数据放置成10*10的矩阵。scan()还可以从屏幕上直接输入数据。 Y<-scan() 然后按提示输入即可。结束输入时按回车即可。Ex3.2>hist(serumdata,freq=FALSE,col="purple",border="red",density=3,angle=60,main=paste("the histogram of serumdata"),xlab="age",ylab="frequency")#直方图。col是填充颜色。默认空白。border是边框的颜色,默认前景色。density是在图上画条纹阴影,默认不画。angle是条纹阴影的倾斜角度(逆时针方向),默认45度。main, xlab, ylab是标题,x和y坐标轴名称。>lines(density(serumdata),col="blue")#密度估计曲线。>x<-64:85> lines(x,dnorm(x,mean(serumdata),sd(serumdata),col="green") #正态分布的概率密度曲线> plot(ecdf(serumdata),verticals=TRUE,do.p=FALSE) #绘制经验分布图> lines(x,pnorm(x,mean(serumdata),sd(serumdata),col="blue") #正态经验分布> qqnorm(serumdata,col="purple") #绘制QQ图> qqline(serumdata,col="red") #绘制QQ直线Ex3.3> stem(serumdata,scale=1) #作茎叶图。原始数据小数点后数值四舍五入。The decimal point is at the | 64 | 300 66 | 23333 68 | 00888777 70 | 34444442222 72 | 0000000777777755555555555 74 | 033333333700000004688888 76 | 5555555226 78 | 0888555 80 | 355266 82 | 84 | 3>boxplot(serumdata,col="lightblue",notch=T) #作箱线图。notch表示带有缺口。> fivenum(serumdata) #五数总结1 64.3 71.2 73.5 75.8 84.3Ex3.4> shapiro.test(serumdata) #正态性Shapori-Wilk检验方法 Shapiro-Wilk normality testdata: serumdataW = 0.9897, p-value = 0.6437结论:p值>0.05,可认为来自正态分布的总体。> ks.test(serumdata,"pnorm",mean(serumdata),sd(serumdata) #Kolmogrov-Smirnov检验,正态性 One-sample Kolmogorov-Smirnov testdata: serumdataD = 0.0701, p-value = 0.7097alternative hypothesis: two-sidedWarning message:In ks.test(serumdata, "pnorm", mean(serumdata), sd(serumdata) : cannot compute correct p-values with ties结论:p值>0.05,可认为来自正态分布的总体。注意,这里的警告信息,是因为数据中有重复的数值,ks检验要求待检数据时连续的,不允许重复值。Ex3.5> y<-c(2,4,3,2,4,7,7,2,2,5,4,5,6,8,5,10,7,12,12,6,6,7,11,6,6,7,9,5,5,10,6,3,10) #输入数据> f<-factor(c(rep(1,11),rep(2,10),rep(3,12) #因子分类> plot(f,y,col="lightgreen") #plot()生成箱线图> x<-c(2,4,3,2,4,7,7,2,2,5,4)> y<-c(5,6,8,5,10,7,12,12,6,6)> z<-c(7,11,6,6,7,9,5,5,10,6,3,10)> boxplot(x,y,z,names=c("1","2","3"),col=c(5,6,7) #boxplot()生成箱线图结论:第2和第3组没有显著差异。第1组合其他两组有显著差异。Ex3.6数据太多,懒得录入。离散图应该用plot即可。Ex3.7> studata<-read.table("3.7.txt") #读入数据> data.frame(studata) #转化为数据框 V1 V2 V3 V4 V5 V61 1 alice f 13 56.5 84.02 2 becka f 13 65.3 98.03 3 gail f 14 64.3 90.04 4 karen f 12 56.3 77.05 5 kathy f 12 59.8 84.56 6 mary f 15 66.5 112.07 7 sandy f 11 51.3 50.58 8 sharon f 15 62.5 112.59 9 tammy f 14 62.8 102.510 10 alfred m 14 69.0 112.511 11 duke m 14 63.5 102.512 12 guido m 15 67.0 133.013 13 james m 12 57.3 83.014 14 jeffery m 13 62.5 84.015 15 john m 12 59.0 99.516 16 philip m 16 72.0 150.017 17 robert m 12 64.8 128.018 18 thomas m 11 57.5 85.019 19 william m 15 66.5 112.0> names(studata)<-c("stuno","name","sex","age","height","weight"),studata #给各列命名 stuno name sex age height weight1 1 alice f 13 56.5 84.02 2 becka f 13 65.3 98.03 3 gail f 14 64.3 90.0.> attach(studata) #将数据框调入内存> plot(weightheight,col="red") #体重对于身高的散点图> coplot(weightheight|sex,col="blue") #不同性别,体重与身高的散点图> coplot(weightheight|age,col="blue") #不同年龄,体重与身高的散点图> coplot(weightheight|age+sex,col="blue") #不同年龄和性别,体重与身高的散点图Ex3.8> x<-seq(-2,3,0.05)> y<-seq(-1,7,0.05)> f<-function(x,y) x4-2*x2*y+x2-2*x*y+2*y2+4.5*x-4*y+4> z<-outer(x,y,f) #必须做外积运算才能绘出三维图形> contour(x,y,z,levels=c(0,1,2,3,4,5,10,15,20,30,40,50,60,80,100),col="blue") #二维等值线> persp(x,y,z,theta=120,phi=0,expand=0.7,col="lightblue") #三位网格曲面Ex3.9> attach(studata)> cor.test(height,weight) #Pearson相关性检验 Pearson's product-moment correlationdata: height and weightt = 7.5549, df = 17, p-value = 7.887e-07alternative hypothesis: true correlation is not equal to 095 percent confidence interval: 0.7044314 0.9523101sample estimates: cor0.8777852由此可见身高和体重是相关的。Ex4.2指数分布,的极大似然估计是n/sum(Xi)> x<-c(rep(5,365),rep(15,245),rep(25,150),rep(35,100),rep(45,70),rep(55,45),rep(65,25)> lamda<-length(x)/sum(x);lamda1 0.05Ex4.3Poisson分布P(x=k)=k/k!*e(-)其均数和方差相等,均为,其含义为平均每升水中大肠杆菌个数。取均值即可。> x<-c(rep(0,17),rep(1,20),rep(2,10),rep(3,2),rep(4,1)> mean(x)1 1平均为1个。Ex4.4> obj<-function(x)f<-c(-13+x1+(5-x2)*x2-2)*x2,-29+x1+(x2+1)*x2-14)*x2) ;sum(f2) #其实我也不知道这是在干什么。所谓的无约束优化问题。> x0<-c(0.5,-2)>nlm(obj,x0)$minimum1 48.98425$estimate1 11.4127791 -0.8968052$gradient1 1.411401e-08 -1.493206e-07$code1 1$iterations1 16Ex4.5> x<-c(54,67,68,78,70,66,67,70,65,69)> t.test(x) #t.test()做单样本正态分布区间估计 One Sample t-testdata: xt = 35.947, df = 9, p-value = 4.938e-11alternative hypothesis: true mean is not equal to 095 percent confidence interval: 63.1585 71.6415sample estimates:mean of x 67.4平均脉搏点估计为 67.4 ,95%区间估计为 63.1585 71.6415 。> t.test(x,alternative="less",mu=72) #t.test()做单样本正态分布单侧区间估计 One Sample t-testdata: xt = -2.4534, df = 9, p-value = 0.01828alternative hypothesis: true mean is less than 7295 percent confidence interval: -Inf 70.83705sample estimates:mean of x 67.4p值小于0.05,拒绝原假设,平均脉搏低于常人。要点:t.test()函数的用法。本例为单样本;可做双边和单侧检验。Ex4.6> x<-c(140,137,136,140,145,148,140,135,144,141);x 1 140 137 136 140 145 148 140 135 144 141> y<-c(135,118,115,140,128,131,130,115,131,125);y 1 135 118 115 140 128 131 130 115 131 125> t.test(x,y,var.equal=TRUE) Two Sample t-testdata: x and yt = 4.6287, df = 18, p-value = 0.0002087alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: 7.53626 20.06374sample estimates:mean of x mean of y 140.6 126.8期望差的95%置信区间为 7.53626 20.06374 。要点:t.test()可做两正态样本均值差估计。此例认为两样本方差相等。ps:我怎么觉得这题应该用配对t检验?Ex4.7> x<-c(0.143,0.142,0.143,0.137)> y<-c(0.140,0.142,0.136,0.138,0.140)> t.test(x,y,var.equal=TRUE) Two Sample t-testdata: x and yt = 1.198, df = 7, p-value = 0.2699alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -0.001996351 0.006096351sample estimates:mean of x mean of y 0.14125 0.13920 期望差的95%的区间估计为-0.001996351 0.006096351Ex4.8接Ex4.6> var.test(x,y) F test to compare two variancesdata: x and yF = 0.2353, num df = 9, denom df = 9, p-value = 0.04229alternative hypothesis: true ratio of variances is not equal to 195 percent confidence interval: 0.05845276 0.94743902sample estimates:ratio of variances 0.2353305要点:var.test可做两样本方差比的估计。基于此结果可认为方差不等。因此,在Ex4.6中,计算期望差时应该采取方差不等的参数。> t.test(x,y) Welch Two Sample t-testdata: x and yt = 4.6287, df = 13.014, p-value = 0.0004712alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: 7.359713 20.240287sample estimates:mean of x mean of y 140.6 126.8期望差的95%置信区间为 7.359713 20.240287 。要点:t.test(x,y,var.equal=TRUE)做方差相等的两正态样本的均值差估计 t.test(x,y)做方差不等的两正态样本的均值差估计Ex4.9> x<-c(rep(0,7),rep(1,10),rep(2,12),rep(3,8),rep(4,3),rep(5,2)> n<-length(x)> tmp<-sd(x)/sqrt(n)*qnorm(1-0.05/2)> mean(x)1 1.904762> mean(x)-tmp;mean(x)+tmp1 1.4940411 2.315483平均呼唤次数为1.90.95的置信区间为1.49,2,32Ex4.10> x<-c(1067,919,1196,785,1126,936,918,1156,920,948)> t.test(x,alternative="greater") One Sample t-testdata: xt = 23.9693, df = 9, p-value = 9.148e-10alternative hypothesis: true mean is greater than 095 percent confidence interval: 920.8443 Infsample estimates:mean of x 997.1灯泡平均寿命置信度95%的单侧置信下限为 920.8443 要点:t.test()做单侧置信区间估计统计建模与R软件第五章习题答案(假设检验) Ex5.1> x<-c(220, 188, 162, 230, 145, 160, 238, 188, 247, 113, 126, 245, 164, 231, 256, 183, 190, 158, 224, 175)> t.test(x,mu=225)        One Sample t-testdata:  xt = -3.4783, df = 19, p-value = 0.002516alternative hypothesis: true mean is not equal to 22595 percent confidence interval: 172.3827 211.9173sample estimates:mean of x   192.15原假设:油漆工人的血小板计数与正常成年男子无差异。备择假设:油漆工人的血小板计数与正常成年男子有差异。p值小于0.05,拒绝原假设,认为油漆工人的血小板计数与正常成年男子有差异。上述检验是双边检验。也可采用单边检验。备择假设:油漆工人的血小板计数小于正常成年男子。> t.test(x,mu=225,alternative="less")        One Sample t-testdata:  xt = -3.4783, df = 19, p-value = 0.001258alternative hypothesis: true mean is less than 22595 percent confidence interval:     -Inf 208.4806sample estimates:mean of x   192.15同样可得出油漆工人的血小板计数小于正常成年男子的结论。Ex5.2> pnorm(1000,mean(x),sd(x)1 0.5087941> x 1 1067  919 1196  785 1126  936  918 1156  920  948> pnorm(1000,mean(x),sd(x)1 0.5087941x<=1000的概率为0.509,故x大于1000的概率为0.491.要点:pnorm计算正态分布的分布函数。在R软件中,计算值均为下分位点。Ex5.3> A<-c(113,120,138,120,100,118,138,123)> B<-c(138,116,125,136,110,132,130,110)> t.test(A,B,paired=TRUE)        Paired t-testdata:  A and Bt = -0.6513, df = 7, p-value = 0.5357alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -15.62889   8.87889sample estimates:mean of the differences                 -3.375p值大于0.05,接受原假设,两种方法治疗无差异。Ex5.4(1)正态性W检验:>x<-c(-0.7,-5.6,2,2.8,0.7,3.5,4,5.8,7.1,-0.5,2.5,-1.6,1.7,3,0.4,4.5,4.6,2.5,6,-1.4)>y<-c(3.7,6.5,5,5.2,0.8,0.2,0.6,3.4,6.6,-1.1,6,3.8,2,1.6,2,2.2,1.2,3.1,1.7,-2)                    > shapiro.test(x)        Shapiro-Wilk normality testdata:  xW = 0.9699, p-value = 0.7527> shapiro.test(y)        Shapiro-Wilk normality testdata:  yW = 0.971, p-value = 0.7754ks检验:> ks.test(x,"pnorm",mean(x),sd(x)        One-sample Kolmogorov-Smirnov testdata:  xD = 0.1065, p-value = 0.977alternative hypothesis: two-sidedWarning message:In ks.test(x, "pnorm", mean(x), sd(x) :  cannot compute correct p-values with ties> ks.test(y,"pnorm",mean(y),sd(y)        One-sample Kolmogorov-Smirnov testdata:  yD = 0.1197, p-value = 0.9368alternative hypothesis: two-sidedWarning message:In ks.test(y, "pnorm", mean(y), sd(y) :  cannot compute correct p-values with tiespearson拟合优度检验,以x为例。> sort(x) 1 -5.6 -1.6 -1.4 -0.7 -0.5  0.4  0.7  1.7  2.0  2.5  2.5  2.8  3.0  3.5  4.016  4.5  4.6  5.8  6.0  7.1> x1<-table(cut(x,br=c(-6,-3,0,3,6,9)> p<-pnorm(c(-3,0,3,6,9),mean(x),sd(x)> p1 0.04894712 0.24990009 0.62002288 0.90075856 0.98828138> p<-c(p1,p2-p1,p3-p2,p4-p3,1-p4);p1 0.04894712 0.20095298 0.37012278 0.28073568 0.09924144> chisq.test(x1,p=p)        Chi-squared test for given probabilitiesdata:  x1X-squared = 0.5639, df = 4, p-value = 0.967Warning message:In chisq.test(x1, p = p) : Chi-squared approximation may be incorrectp值为0.967,接受原假设,x符合正态分布。(2)方差相同模型t检验:> t.test(x,y,var.equal=TRUE)        Two Sample t-testdata:  x and yt = -0.6419, df = 38, p-value = 0.5248alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -2.326179  1.206179sample estimates:mean of x mean of y    2.065     2.625方差不同模型t检验:> t.test(x,y)        Welch Two Sample t-testdata:  x and yt = -0.6419, df = 36.086, p-value = 0.525alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -2.32926  1.20926sample estimates:mean of x mean of y    2.065     2.625配对t检验:> t.test(x,y,paired=TRUE)        Paired t-testdata:  x and yt = -0.6464, df = 19, p-value = 0.5257alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -2.373146  1.253146sample estimates:mean of the differences                  -0.56三种检验的结果都显示两组数据均值无差异。(3)方差检验:> var.test(x,y)        F test to compare two variancesdata:  x and yF = 1.5984, num df = 19, denom df = 19, p-va

    注意事项

    本文(统计建模与R软件课后习题答案25章.doc)为本站会员(文库蛋蛋多)主动上传,三一办公仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三一办公(点击联系客服),我们立即给予删除!

    温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。




    备案号:宁ICP备20000045号-2

    经营许可证:宁B2-20210002

    宁公网安备 64010402000987号

    三一办公
    收起
    展开