应用多元统计分析.ppt

资源ID：5723439 资源大小：1.87MB 全文页数：38页
资源格式： PPT 下载积分：15金币

快捷下载

会员登录下载

三方登录下载：

下载资源需要15金币

邮箱/手机：
温馨提示：	用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）
支付方式：
验证码：	换一换

加入VIP免费专享

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？

友情提示

1、下载资料失败解决办法

2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。

3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。

4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

5、试题试卷类文档，如果标题没有明确说明有答案则都视为没有答案，请知晓。

网站客服

侵权投诉

应用多元统计分析.ppt

Preface to the 1st Edition,Most of the observable phenomenafinmin in the empirical(empirikl经验)sciences are of a multivariate nature.In financial studies,assets in stock markets are observed simultaneously and their joint development is analyzed to better understand general tendencies（趋势）and to track indices（路灯）.The underlying theoretical structure of these and many other quantitative studies of applied sciences is multivariate.This book on Applied Multivariate Statistical Analysis presents the tools and concepts of multivariate data analysis with a strong focus on applications.The aim of the book is to present multivariate data analysis in a way that is understandable for non-mathematicians and practitioners who are（面对）by statistical data analysis.This is achieved by focusing on the practical relevance and through the e-book character of this text.All practical examples may be recalculated and modified by the reader using a standard web browser and without reference or application of any specific software.,Most of the observable phenomenafinmin in the empirical(empirikl经验)sciences are of a multivariate nature.The underlying theoretical structure of these and many other quantitative studies of applied sciences is multivariate.This book on Applied Multivariate Statistical Analysis presents the tools and concepts of multivariate,mlti vereit data analysis with a strong focus on applications.,The book is divided into three main parts.The first part is devoted to graphical techniques describing the distributions of the variables involved.The second part deals with multivariate random variables and presents from a theoretical point of view distributions,estimators and tests for various practical situations.The last part is on multivariate techniques and introduces the reader to the wide selection of tools available for multivariate data analysis.All data sets are given in the appendix and are downloadable from md-stat.The text contains a wide variety of exercises the solutions of which are given in a separate textbook.In addition a full set of transparencies on md-stat is provided making iteasier for an instructor to present the materials in this book.All transparencies contain hyper links to the statistical web service so that students and instructors alike may repute all examples via a standard web browser.,1-2 week,UNIT-I Descriptive Techniques(描述技术)1 Comparison（对照）of Batches1.1 Boxplots 4 1.2 Histograms 101.3 Scatterplots 171.4 Data Set-Boston Housing 35,1 Comparison of Batches,Multivariate statistical analysis is concerned with analyzing and understanding data in high dimensions.We suppose that we are given a set xini=1 of n observations of a variable vector X in Rp.That is,we suppose that each observation xi has p dimensions:xi=(xi1,xi2,.,xip),and that it is an observed value of a variable vector X Rp.Therefore,X is posed of p random variables:X=(X1,X2,.,Xp)where Xj,for j=1,.,p,is a one-dimensional random variable.,1 Comparison of Batches,Multivariate statistical analysis is concerned with analyzing and understanding data in high dimensions.How do we begin to analyze this kind of data?Before we investigate questions on what inferences we can reach from the data,we should think about how to look at the data.This involves descriptive techniques.Questions that we could answer by descriptive techniques are:Are there ponents of X that are more spread out than others?Are there some elements of X that indicate subgroups of the data?Are there outliers in the ponents of X?How“normal”is the distribution of the data?,1.1 Boxplots,1 Comparison of Batches,Genuinedenjuin真正的,X6,X1,The median and mean bars are measures of locations.The relative location of the median(and the mean)in the box is a measure of skewness.The length of the box and whiskers are a measure of spread.The length of the whiskers indicate the tail length of the distribution.The outlying points are indicated with a“”or“”depending on if they are outside of FUL 1.5dF or FUL 3dF respectively.The boxplots do not indicate multi modality or clusters.If we pare the relative size and location of the boxes,we are paring distributions.,Summary,Reading material,1.2 Histograms,h=0.4,Diagonal,Histograms are density(denst)(密度)estimates(estimeits概算).A density estimate gives a good impression of the distribution of the data.In contrast to boxplots,density estimates show possible multimodality(多模式；综合,mltimdliti)of the data.The idea is to locally represent the data density by counting the number of observations in a sequence of consecutive（连续的）intervals(bins)（箱）with origin（rn起源、原点）x0.Let Bj(x0,h)denote(dinut,指示,表示)the bin of length h which is the element of a bin grid starting at x0:Bj(x0,h)=x0+(j 1)h,x0+jh),j Z,where.,.)(square brackets)denotes a left closed and right open interval(ntrvl 间隔,右开区间).,If xin i=1 is an i.i.d.sample with density f,the histogram is defined as follows:In sum(1.7)the first indicator function I xi Bj(x0,h)counts the number of observations falling into bin Bj(x0,h).The second indicator function I is responsible for“localizing”（luklizi局限）the counts around x.The parameter h is a smoothing or localizing parameter and controls the width(wid)of the histogram bins.An h that is too large leads to very big blocks and thus to a very unstructured histogram.On the other hand,an h that is too small gives a very variable estimate with many unimportant peaks.,H=0.1,H=0.2,H=0.3,Diagonaldaignladj.对角线的,斜的 n.对角线,斜线,H=0.4,The effect of h is given in detail in Figure 1.6.It contains the histogram(upper left)for the diagonal of the counterfeit bank notes for x0=137.8(the minimum of these observations)and h=0.1.Increasing h to h=0.2 and using the same origin,x0=137.8,results in the histogram shown in the lower left of the figure.This density histogram is somewhat smoother due to the larger h.The binwidth is next set to h=0.3(upper right).From this histogram,one has the impression that the distribution of the diagonal is bimodal with peaks at about 138.5 and 139.9.The detection of modes requires a fine tuning of the binwidth.Using methods from smoothing methodology(medldi，n.方法学)one can find an“optimal”binwidth h for n observations:,counterfeitkauntfitadj.假冒的,假装的,In Figure 1.7,we show histograms with x0=137.65(upper left),x0=137.75(lower left),with x0=137.85(upper right),and x0=137.95(lower right).All the graphs have been scaled equally on the y-axis to allow parison.One sees thatdespite the fixed binwidth hthe interpretation is not facilitated(fsiliteitid vt.使容易).The shift of the origin x0(to 4 different locations)created 4 different histograms.This property of histograms strongly contradicts the goal of presenting data features.,Modes of the density are detected with a histogram.Modes correspond to strong peaks in the histogram.Histograms with the same h need not be identical.They also depend on the origin x0 of the grid.The influence of the origin x0 is drastic.Changing x0 creates different looking histograms.The consequence of an h that is too large is an unstructured histogram that is too flat.A bin width h that is too small results in an unstable histogram.There is an“optimal”h=(24/n)1/3.It is remended to use averaged histograms.They are kernel densities.,Summary,1.4 Scatterplots,Scatterplots are bivariate or trivariate plots of variables(vribl)against each other.They help us understand relationships among the variables of a data set.A downward-sloping(slupi)scatter indicates that as we increase the variable on the horizontal axis,the variable on the vertical axis decreases(di:kri:s vt.减少).An analogous(nlgs adj.类似的)statement can be made for upward-sloping scatters.,Figure 1.12 plots the 5th column(upper inner frame)of the bank data against the 6th column(diagonal).The scatter is downward-sloping.As we already know from the previous section on marginal parison a good separation between genuine and counterfeit bank notes is visible for the diagonal variable.The sub-cloud in the upper half(circles)of Figure 1.12 corresponds to the true bank notes.As noted before,this separation is not distinct(adj.清楚的、明显),since the two groups overlap(,uvlp vt.重叠)somewhat.,Draftman绘图员,Scatterplots in two and three dimensions helps in identifying separated points,outliers or sub-clusters.Scatterplots help us in judging positive or negative dependencies.Draftman scatterplot matrices help detect structures conditioned on values of other variables.As the brush of a scatterplot matrix moves through a point cloud,we can study conditional dependence.,Summary,1.8 Data Set,Boston Housing Data Set,Variablevribladj.可变的,易变的,不定的n.变量,可变物,First Step：New Words第一类高频词 160个,

注意事项

本文（应用多元统计分析.ppt）为本站会员（牧羊曲112）主动上传，三一办公仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知三一办公（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。