研究生R语言考题免费下载.doc

资源ID：3425497 资源大小：203KB 全文页数：19页
资源格式： DOC 下载积分：8金币

快捷下载

会员登录下载

三方登录下载：

下载资源需要8金币

邮箱/手机：
温馨提示：	用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）
支付方式：
验证码：	换一换

加入VIP免费专享

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？

友情提示

1、下载资料失败解决办法

2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。

3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。

4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

5、试题试卷类文档，如果标题没有明确说明有答案则都视为没有答案，请知晓。

网站客服

侵权投诉

研究生R语言考题免费下载.doc

暨南大学考试试卷教师填写2010 - 2011_ 学年度第_2_学期课程名称：数据分析与R语言应用授课教师姓名：_王斌会_ 考试时间:_2011_年_11_月_8_日课程类别必修选修 Ö 考试方式开卷 Ö 闭卷试卷类别(A、B) A 共 4 页考生填写经济学院学院(校) 数量经济学专业班(级)姓名刘伟学号 1130111008 内招外招题号一二三四五六七八九十总分得分得分评阅人一、统计图表（共1小题，共20分）1应用R图表对各类产品供货走势图分析类别月份123456789101112彩电A1冰箱A2空调A3洗衣机A4（1）要求：数据由R随机数函数生成，产生20,50间的均匀随机数。解：首先对R进行初始化，设定参数，再生成随机数，代码如下：rm(list=ls()options(digits=4)par(mar=c(4,4,2,1)+0.1,cex=0.75)A1=runif(12,20,50);A1A2=runif(12,20,50);A2A3=runif(12,20,50);A3A4=runif(12,20,50);A4（2）分析（图形要进行一定修饰）：1）绘制各类产品的月份趋势线图。解：趋势线图如下代码如下：par(mfrow=c(2,2)plot(A1,type="l",ylab="销售量",xlab="月份",main="彩电(A1)",xlim=c(1,12),ylim=c(0,50)plot(A2,type="l",ylab="销售量",xlab="月份",main="冰箱(A2)",xlim=c(1,12),ylim=c(0,50)plot(A3,type="l",ylab="销售量",xlab="月份",main="空调(A3)",xlim=c(1,12),ylim=c(0,50)plot(A4,type="l",ylab="销售量",xlab="月份",main="洗衣机(A4)",xlim=c(1,12),ylim=c(0,50)2）绘制各类产品的季度的柱形图。解：首先对数据进行整理，得出各自的季度数据。柱状图如下代码如下：dat=data.frame(A1,A2,A3,A4)q1=c(dat1,1+dat2,1+dat3,1,dat4,1+dat5,1+dat6,1,dat7,1+dat8,1+dat9,1,dat10,1+dat11,1+dat12,1)q2=c(dat1,2+dat2,2+dat3,2,dat4,2+dat5,2+dat6,2,dat7,2+dat8,2+dat9,2,dat10,2+dat11,2+dat12,2)q3=c(dat1,3+dat2,1+dat3,3,dat4,3+dat5,1+dat6,3,dat7,3+dat8,3+dat9,3,dat10,3+dat11,3+dat12,3)q4=c(dat1,4+dat2,4+dat3,4,dat4,4+dat5,4+dat6,4,dat7,4+dat8,4+dat9,4,dat10,4+dat11,4+dat12,4)dat1=data.frame(q1,q2,q3,q4);dat1par(mfrow=c(2,2)barplot(dat1,1,xlab="季度",ylab="销售量",main="彩电(A1)",ylim=c(0,150)barplot(dat1,2,xlab="季度",ylab="销售量",main="冰箱(A2)",ylim=c(0,150)barplot(dat1,3,xlab="季度",ylab="销售量",main="空调(A3)",ylim=c(0,150)barplot(dat1,4,xlab="季度",ylab="销售量",main="洗衣机(A4)",ylim=c(0,150)3）绘制各类产品的年度的饼图。解：饼图如下代码如下：par(mfrow=c(1,1)y1=dat11,1+dat12,1+dat13,1+dat14,1y2=dat11,2+dat12,2+dat13,2+dat14,2y3=dat11,3+dat12,3+dat13,3+dat14,3y4=dat11,4+dat12,4+dat13,4+dat14,4x=c(y1,y2,y3,y4)pie(x,labels=c("彩电(A1)","冰箱(A2)","空调(A3)","洗衣机(A4)"),col=c("red","green","purple","blue")得分评阅人二、统计检验（共2小题，每题10分，共20分）1. 两台铣床生产同一种型号的套管，平日两台铣床加工的套管内槽深度都服从正态分布N(10，0.32)和N(8，0.22)，从这两台铣床的产品中分别抽出13个和15个，请分别按方差已知和未知检验两台产品的深度是否不同(=0.05)？（1）两台铣床的产品内槽精度(方差)有无显著差别?解： x=rnorm(13,10,0.3)y =rnorm(15,8,0.2)var.test(x,y) F test to compare two variancesdata: x and y F = 3.899, num df = 12, denom df = 14, p-value =0.01785alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 1.278 12.502 sample estimates:ratio of variances 3.899由于 p-value =0.01785<0.05,故两台铣床的产品内槽精度(方差)有显著差别。（2）两台产品的的深度是否不同? 解：1、方差未知时 t.test(x,y)Welch Two Sample t-testdata: x and y t = 17.61, df = 17.2, p-value = 1.934e-12alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 1.605 2.042 sample estimates:mean of x mean of y 9.982 8.159 由于p-value = 1.934e-12<0.05,故两台产品的的深度是不同的。 2、方差已知时 u.test=function(x,y,sigmax,sigmay) nx=length(x) ny=length(y) xbar=mean(x) ybar=mean(y) u=(xbar-ybar)/sqrt(sigmax2/nx+sigmay2/ny) p=pnorm(u,lower.tail=F) c(u=u,p=p) u.test(x,y,0.3,.02) u p 2.187e+01 2.332e-106由于p=2.332e-106<0.05, 故两台产品的的深度是不同的。2. 如果还有一台铣床生产同一种型号的套管，其加工的套管内槽深度都服从正态分布N(12，0.42)，从这台铣床的产品中抽出18个，请分别按方差已知和未知检验三台产品的深度是否不同(=0.05)？(1)、方差已知的情况 x1=rnorm(13,10,0.3)x2 =rnorm(15,8,0.2)x3=sample(rnorm(1000,12,0.4),18)n1=length(x1)n2=length(x2)n3=length(x3)se1=sqrt(0.32/n1+0.22/n2)se2=sqrt(0.32/n1+0.42/n3)se3=sqrt(0.22/n2+0.42/n3)x1bar=mean(x1) x2bar=mean(x2)x3bar=mean(x3) u1=(x1bar-x2bar)/se1 u2=(x1bar-x3bar)/se3 u3=(x2bar-x3bar)/se3chi=u12+u22+u32 p=2*pchisq(chi,3,lower.tail = F);p P< a=0.05，所以拒绝三台产品的的深度相等的假设，三台台产品的深度不等。 (2)、方差未知的情况y=c(x1,x2,x3) group=c(rep(1,13),rep(2,15),rep(3,18) oneway.test(ygroup) One-way analysis of means (not assuming equal variances)data: y and group F = 564.7, num df = 2.00, denom df = 28.28, p-value < 2.2e-16 P< a=0.05，所以拒绝三台产品的的深度相等的假设，三台台产品的深度不等。得分评阅人三、统计建模（共1小题，共20分）为了解百货商店销售额x与流通费率y（这是反映商业活动的一个质量指标，指每元商品流转额所分摊的流通费用）之间的关系，收集25个商店的有关数据，这里假定x来自10,30上的均匀随机数，eN(0，0.32)的正态随机数。（1）设y=2+3*x+e，试用R拟合y=a+bx的线性回归模型解： x=runif(25,10,30) e=rnorm(25,0,0.3) y=c(2+3*x+e)fm=lm(yx) fmCall:lm(formula = y x)Coefficients:(Intercept) x 2.06 3.00 故y=2.06+3.00x.（2）设y=2+3*ln(x)+e，试用R拟合y=a+bln(x)的对数回归模型解： x=runif(25,10,30) e=rnorm(25,0,0.3)y=c(2+3*log(x)+e) fm=lm(yx) fmCall:lm(formula = y log(x)Coefficients:(Intercept) log(x) 2.03 3.02 故y=2.03+3.02ln(x).（3）设y=2+3*x2+0.5*x3+e，试用R拟合y=a+b*x2+c*x3+e的回归模型解： x=runif(25,10,30) e=rnorm(25,0,0.3) y=c(2+3*x2+0.5*x3+e) fm=lm(yI(x2)+I(x3) fmCall:lm(formula = y I(x2) + I(x3)Coefficients:(Intercept) I(x2) I(x3) 2.296 2.999 0.500 故y=2.296+2.999*x2+0.500*x3 . 得分评阅人四、数据分析（共2小题，每题20分，共40分）1. Let X be a matrix of random normal values (mean =0; sd=1) having 10 columns and N=100 rows. Reset the values in the first row in the matrix to (1,1.5,1.4,3,1.9,4,4.9,2.6,3.2,2.4). Assume that the first 5 columns of data for each row correspond to a group A, while the remaining 5 to another group B.1.) For each row of the matrix X, compute:a) the t-statistic comparing the groups A and B assuming equal variance and the p-value b) Compute the probability to observe such a t-statistics only by chance, using a permutation analysis. The following strategy will be used: the columns will be randomly permuted nk=1000 times, and at each iteration the t-statistic will be computed again and recorded in a vector. At the end, compute the p-value as the number of times out of nk when the t-statistic with the permuted data was at least as or more extreme than the t-statistics obtained with the real (non-permuted data). Present the result as a data.frame with 4 columns: ID= Row number, t= t-score, p_theoretical=p-value assuming the asymptotic distribution; p_permutations=p-value from permutations;Sort the data.frame in descending order of p-values. 解：X1=matrix(rnorm(1000,0,1),100,10) X2=ma12:100,1:10V=c(1,1.5,1.4,3,1.9,4,4.9,2.6,3.2,2.4)AB=rbind(V,X2) #构造如题中所述的矩阵AB，其前5列为A矩阵，后5列为B矩阵p_theoretical =apply(AB,1,function(x) t.test(x1:5,x6:10,var.equal=T)$p.value ) #计算AB矩阵每行两个样本t检验的p值t=apply(AB,1,function(x) t.test(x1:5,x6:10,var.equal=T)$statistic ) #计算AB矩阵每行两个样本t检验的t值t1=abs(t) # 取t值的绝对值m=list() tt=list() # 构造两个列表m和tt用于下面的置换循环for(i in c(1:1000)mi=AB,sample(ncol(AB) #将AB的10列随机排列，得到1000个新矩阵tti=abs(apply(mi,1,function(x) t.test(x1:5,x6:10,var.equal=T)$statistic ) #计算每个矩阵的每行两个样本t检验的t值的绝对值，上述过程需要15分钟左右时间u=do.call(cbind,tt) #将列表tt转换为矩阵up_permutations =array() #构造数组p_permutationsfor(j in c(1:100) p_permutations j=sum(uj, >=t1j)/1000 #计算矩阵u每行中的数据大于t1中对应t值绝对值的概率，并将值赋予p_permutations数组ID=c(1:100) # 构造row number result=data.frame(ID,t, p_theoretical, p_permutations) #输出结果 ID t p_theoretical p_permutations1 1 -2.8853207 0.02034 0.0152 2 -1.3143173 0.22517 0.2273 3 -2.6940739 0.02732 0.0064 4 -1.3802458 0.20485 0.2285 5 0.0473332 0.96341 0.9776 6 0.9959910 0.34842 0.2917 7 -0.4321985 0.67701 0.7228 8 -0.4268158 0.68077 0.6789 9 -0.1412908 0.89113 0.89910 10 1.3169132 0.22434 0.25411 11 -0.0489858 0.96213 0.98812 12 1.4028308 0.19826 0.16913 13 1.1741980 0.27408 0.25614 14 -1.0676853 0.31682 0.31715 15 2.6462644 0.02943 0.01216 16 -1.9388080 0.08851 0.06117 17 -0.6731468 0.51982 0.53618 18 -0.3519451 0.73397 0.75519 19 0.5819389 0.57663 0.57720 20 0.6656648 0.52435 0.57721 21 -0.1460266 0.88751 0.74822 22 -2.7035412 0.02693 0.02323 23 -0.7900807 0.45226 0.47224 24 -0.4385279 0.67260 0.62925 25 0.3676752 0.72265 0.74226 26 2.2177703 0.05738 0.03527 27 -0.1415220 0.89096 0.91828 28 -0.0004833 0.99963 1.00029 29 -0.4102665 0.69238 0.68730 30 0.0448795 0.96530 0.94431 31 1.4521009 0.18454 0.19332 32 0.2854683 0.78254 0.79333 33 0.1298656 0.89988 0.89634 34 -0.9887282 0.35175 0.34635 35 -1.4024164 0.19838 0.15836 36 0.1285964 0.90085 0.85037 37 3.0086702 0.01685 0.03238 38 0.6787512 0.51645 0.50339 39 1.0937833 0.30589 0.33140 40 1.9842917 0.08250 0.10241 41 -0.0968299 0.92524 0.96942 42 -3.1236870 0.01415 0.01643 43 0.6596542 0.52801 0.50144 44 1.2064446 0.26211 0.26745 45 -0.2720078 0.79250 0.81246 46 -0.1762814 0.86445 0.86047 47 -0.2106570 0.83842 0.87548 48 1.9866318 0.08220 0.07249 49 -0.0020090 0.99845 1.00050 50 -2.0646728 0.07283 0.03151 51 -0.2564763 0.80406 0.79752 52 2.8462192 0.02160 0.02853 53 -0.0657272 0.94921 0.96654 54 0.6318659 0.54510 0.58955 55 0.7555194 0.47159 0.42556 56 1.5005975 0.17185 0.17957 57 1.2630394 0.24214 0.24358 58 -0.6921399 0.50844 0.53959 59 -1.3837198 0.20382 0.19160 60 0.4838255 0.64148 0.63961 61 -0.1140807 0.91198 0.94462 62 1.1902273 0.26808 0.25863 63 0.2112508 0.83797 0.83664 64 0.8978211 0.39550 0.37365 65 -0.4111239 0.69177 0.71266 66 3.1668091 0.01326 0.01667 67 1.6002204 0.14822 0.14268 68 0.8488613 0.42063 0.39769 69 0.7102140 0.49775 0.49370 70 0.0780357 0.93972 0.93071 71 1.2471030 0.24763 0.25972 72 0.9615742 0.36442 0.33073 73 1.2386493 0.25059 0.27274 74 -0.6696054 0.52196 0.53775 75 1.2152901 0.25890 0.25476 76 -0.1259623 0.90287 0.92477 77 -0.0791718 0.93884 0.96478 78 -3.6579469 0.00642 0.01679 79 0.2499050 0.80896 0.78080 80 -2.3661564 0.04552 0.05281 81 0.0254582 0.98031 0.98882 82 -1.9755606 0.08362 0.08583 83 0.0015678 0.99879 1.00084 84 0.1471918 0.88662 0.87185 85 0.0295263 0.97717 0.98486 86 0.2628059 0.79934 0.80987 87 0.0846366 0.93463 0.92888 88 -0.5474582 0.59900 0.63589 89 -2.7182033 0.02632 0.01390 90 0.5733288 0.58218 0.67891 91 2.3551061 0.04631 0.04892 92 0.7709098 0.46292 0.43593 93 -0.4178680 0.68703 0.63494 94 0.2916059 0.77801 0.73695 95 0.2728228 0.79190 0.76296 96 1.5964084 0.14906 0.15597 97 1.3246252 0.22188 0.26698 98 0.6856973 0.51228 0.51199 99 -1.2272690 0.25461 0.248100 100 0.5488290 0.59810 0.5912.) Plot the distribution (see hist) of the resulting vector of t-scores obtained at step 1a) after excluding the first element (corresponding to the first row) and on the same graph show a vertical line for the t-value of the first row. 解：t2=t2:100hist(t2,main="t值分布的直方图")abline(v=t1)2. Olympic MedalsDuring both summer and winter Olympic games the medal table is often of interest to spectators and the media. The medal table is a tally of the number of medals which have been won by each participating country during the games. A good performance on the medal table is often a source of pride for a country. However, it is to be expected that large countries will win more medals than smaller countries, due to the fact that they have a larger pool from which to recruit athletes. Thus smaller countries often argue that a better measure of performance would be medals per capita. However, it is possible that medal tally shouldn't be expected to increase in direct proportion to population. Further, it is reasonable to think that the medal tally will also depend on the resources available to athletes in a country, or on the climate (for example, access to snow).The objective of this analysis is to explore the relationship between a country's medal tally, population size, wealth (measured by GDP) and climate (approximated by latitude). Further, it is proposed that in future a standardised measure of a country's medal tally should be developed which corrects for population size, climate and wealth. Your should investigate the feasibility of this proposal, and discuss your finndings.The file medals.RDataload(medals.RData) in R is an R data frame with one row for every country that has won at least one olympic medal in the previous four Olympic Games.The variable descriptions are as follows:Country Name of the competing country (only countries which have won at least one medal since 2004 are included).Latitude Latitude of the capital city.Summer2004 Total number of medals (gold, silver and bronze) won at the Summer Olympics in 2004.Summer2008 Total number of medals won at the Summer Olympics in 2

注意事项

本文（研究生R语言考题免费下载.doc）为本站会员（仙人指路1688）主动上传，三一办公仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知三一办公（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。