《判别与分类》PPT课件.ppt
1,11 Discriminant Analysis判别分析,Sec.1 介绍 Sec.2 两总体分类Sec.3 两正态总体分类Sec.4 评估判别函数Sec.5 多总体分类Sec.6 典型判别函数Sec.7 logistic 回归和判别分析Sec.8 评注,2,The ideas associated with discriminant analysis can be traced back to the 1920s and work completed by the English statistician Karl Pearson,and others,on intergroup distances,e.g.,coefficient of racial likeness(CRL),(Huberty,1994).In the 1930s R.A.Fisher translated multivariate intergroup distance into a linear combination of variables to aid in intergroup discrimination.Methodologists from Harvard University contributed much to the interest in application of discriminant analysis in education and psychology in the 1950s and 1960s(Huberty,1994).Klecka(1980)provided several historical references that deal mostly with early applications of DA.,历史,2023/7/10,中国人民大学六西格玛质量管理研究中心,3,目录 上页 下页 返回 结束,第四章 判别分析,回归模型普及性的基础在于用它去预测和解释度量(metric)变量。但是对于非度量(nonmetric)变量,多元回归不适合解决此类问题。本章介绍的判别分析来解决被解释变量是非度量变量的情形。在这种情况下,人们对于预测和解释影响一个对象所属类别的关系感兴趣,比如为什么某人是或者不是消费者,一家公司成功还是破产等。判别分析在主要目的是识别一个个体所属类别的情况下有着广泛的应用。潜在的应用包括预测新产品的成功或失败、决定一个学生是否被录取、按职业兴趣对学生分组、确定某人信用风险的种类、或者预测一个公司是否成功。在每种情况下,将对象进行分组,并且要求使用这两种方法中的一种可以通过人们选择的解释变量来预测或者解释每个对象的所属类别。,2023/7/10,中国人民大学六西格玛质量管理研究中心,4,4.1 判别分析的基本理论,有时会遇到包含属性被解释变量和几个度量解释变量的问题,这时需要选择一种合适的分析方法。比如,我们希望区分好和差的信用风险。如果有信用风险的度量指标,就可以使用多元回归。但我们可能仅能判断某人是在好的或者差的一类,这就不是多元回归分析所要求的度量类型。当被解释变量是属性变量而解释变量是度量变量时,判别分析是合适的统计分析方法。判别分析能够解决两组或者更多组的情况。当包含两组时,称作两组判别分析。当包含三组或者三组以上时,称作多组判别分析(Multiple discriminant analysis)。判别分析的假设条件判别分析最基本的要求是,分组类型在两组以上;在第一阶段工作是每组案例的规模必须至少在一个以上。解释变量必须是可测量的,才能够计算其平均值和方差,使其能合理地应用于统计函数。,2023/7/10,中国人民大学六西格玛质量管理研究中心,5,目录 上页 下页 返回 结束,4.1 判别分析的基本理论,判别分析的假设之一,是每一个判别变量(解释变量)不能是其他判别变量的线性组合。即不存在多重共线性问题。判别分析的假设之二,是各组变量的协方差矩阵相等。判别分析最简单和最常用的形式是采用线性判别函数,它们是判别变量的简单线性组合。在各组协方差矩阵相等的假设条件下,可以使用很简单的公式来计算判别函数和进行显著性检验。判别分析的假设之三,是各判别变量之间具有多元正态分布,即每个变量对于所有其他变量的固定值有正态分布。在这种条件下可以精确计算显著性检验值和分组归属的概率。当违背该假设时,计算的概率将非常不准确。,6,Overview,Discriminant function analysis,a.k.a.discriminant analysis or DA,主要用于分类.好的判别函数,应该正确判断率比较高.Discriminant function analysis is found in SPSS under Analyze,Classify,Discriminant.One gets DA or MDA from this same menu selection,depending on whether the specified grouping variable has two or more categories.,7,There are several purposes for DA and/or MDA:To classify cases into groups using a discriminant prediction equation.To test theory by observing whether cases are classified as predicted.To investigate differences between or among groups.To determine the most parsimonious way to distinguish among groups.To determine the percent of variance in the dependent variable explained by the independents.To determine the percent of variance in the dependent variable explained by the independents over and above the variance accounted for by control variables,using sequential discriminant analysis.To assess the relative importance of the independent variables in classifying the dependent variable.To discard variables which are little related to group distinctions.To infer the meaning of MDA dimensions which distinguish groups,based on discriminant loadings.,8,Discriminant analysis has two steps:(1)F检验(Wilks lambda)可以用于检验判别模型是否显著,(2)如F检验显著,然后考察独立变量在类别之间的差异,以便对依赖变量进行分类。,Suppose an anesthesiologist needs to determine whether an anesthetic is safe for a person who is having a heart operation.Based on these kinds of criteria,the anesthesiologist would like to know the following:can this knowledge be used to construct a rule that will classify new patients as to whether they are going to be safe or unsafe recipients of the anesthetic?what is the rule and can the rule be used to classify new patients?what are the chances of making mistakes when using the rule?,麻剂,10,Discriminant analysis 为用来建立规则一种多元技术,该技术能帮助样本进行适当分类。Discriminant analysis 类似于回归分析,但是其依赖变量或者被解释变量为定性变量,而不是连续的。.Discriminant analysis is 也称为分类分析.,目的:从不同总体(或类别)中刻画个体的特征。尽量从不同类别使用判别器或分类器分离开来.Goal of classification:把不同个体分类到不同类别中.问题是找到一个好的规则,能最优的对新个体进行分类!,12,11.2 两总体分类,主要问题(1)分类两类个体 or(2)把新个体指派到其中一个类别。记两个类别为 1 and 2.The objects are separated or classified on the basis of measurements on p associated random variables X=X1,X2,Xp.The observed values of X differ to some extent from one class to the other.,我们把第一类的个体看成一个总体 1 and 第二类的个体看成一个总体 2.这两个总体对应的概率密度函数为f1(X)and f2(X),and consequently,这样可以就可以讨论如何指定个体属于那个类.,Example 11.1 考虑某城镇中两类人群:1,割草机拥有者,and 2,those 不拥有者.In order to identify the best prospect for an intensive sales campaign,生产商 is interested in classifying families as prospective owners or nonowners on the basis of x 1=income and x 2=lot size.Random samples of n 1=12 current owners and n 2=12 current nonowners are selected.The sample observations yield the scatter plot(Figure 11.1).,Remark 1.一个好的判别方法应该产生少数错误分类.2.要考虑先验概率.3.考虑误判的成本或代价.(e.g.diagnose disease),基本思想 令 f1(X)and f2(X)分别为两总体 1 and 2 对应的密度函数.我们的目的是要把X指定给其中一个总体中.令 为全空间.令R1 为x的一个集合,当x属于R1时,我们把对象x分配给总体 1,反之如果属于 R2=-R1 则分配给总体 2.假定 集合 R1 和 R2 互斥,构成全空间.,令 p1 为 1的先验概率 and p2 为 2的先验概率,其中 p1+p2=1.那么 P(观测对象被正确地划入 1)=P(X R1|1)P(1)=P(1|1)p1 P(观测对象被错误划入 1)=P(X R1|2)P(2)=P(1|2)p2 P(观测对象被正确划入2)=P(X R2|2)P(2)=P(2|2)p2 P(观测对象被错误划入 2)=P(X R2|1)P(1)=P(2|1)p1(11-3),错分代价可以代价矩阵来表示:其中 c(2|1)为属于 1 被错误划入 2 的代价,and c(1|2)为属于 2 被错误划入 1 的代价.,那么平均的或期望的错分代价为(ECM)ECM=c(2|1)P(2|1)p1+c(1|2)P(1|2)p2(11-5)一个合理的分类法则应该有最小或尽可能小的ECM.结论 11.1.是ECM达到最小的区域R1 and R2 由下列不等式确定:,Proof of the result 11.1,We need to show that the regions R1 and R2 that minimize the ECM are defined by the vlues x for which the following inequalities hold:Substituting the expressions for P(2|1)and P(1|2)into(11-5)gives,We get the result 11.1.,11-7,假设有一个新观测点 x 0,其中 f 1(x 0)=.3 and f 2(x 0)=.4.问该点应该划入那个总体?Then 我们发现 x 0 R1,因此应该将其分入1,Other criteria,总错误概率(TPM)最小化原则。TPM=P(错分 1 的观测值或错分 2 的观测值)=p1 R1 f1(x)dx+p2 R2 f2(x)dx(11-8)数学上这个问题等价于在错分代价相同情况下师期望错分代价最小化。因此,这种情况下的最优区域由(11-7)中的(b)给出.,最大后验概率原则,当 P(1|x0)P(2|x0)时,x0 划入总体 1.,注释:相当于采用(11-7)中的总错分概率的法则(b),因为上式中分母相同.但是,在观测到x0 后再计算总体 1 和 2 的概率,这对识别不很明确的分配来说常常有用。,11.3 两正态总体的分类,正态总体分类方法简单高效。假定 f1(X)and f2(X)为多元正态密度函数,,分别有均值 1 and 协方差矩阵 1 and 均值向量 2 and 协方差矩阵 2.,(二)两个总体距离判别法,先考虑两个总体的情况,设有两个协差阵相同的p维正态总体,对给定的样本Y,判别一个样本Y到底是来自哪一个总体,一个最直观的想法是计算Y到两个总体的距离。故我们用马氏距离来给定判别规则,有:,1、方差相等,则前面的判别法则表示为,当 和已知时,是一个已知的p维向量,W(y)是y的线性函数,称为线性判别函数。称为判别系数。用线性判别函数进行判别分析非常直观,使用起来最方便,在实际中的应用也最广泛。,Result 11.2,假定两总体 1 and 2 具有(11-10)的密度函数.这时使 ECM 最小化的分配法则如下:把 x0 分配给 1 如Allocate x0 to 2 otherwise.,上式中判别函数现在变成了一个线性函数了!,Proof of 11.2,Proof.Since the quantities in(11-11)are nonnegative for all x,we can take their natural logarithms and preserve the order of the inequalities.Moreover,Consequently,combine with(11-11),we get the results.,当总体参数 1,2,and 未知.Wald and Anderson suggest 建议将总体参数用样本对应量来代替.,1:正常人群 n1=30 2:A型血友病犯者 n2=22,调查信息 因此代价相同,先验概率相同情况下 得到,分配规则 如果 x0=.210,.044,then y0=6.62 4.61.我们把其分给 2.,假设先验概率已知:p 1=.75,p 2=.25.并假定 c(1|2)=c(2|1).利用判别统计量 有 w=6.62(4.61)=2.01,Applying(11-18),we see that 这样我们可以分配给 2,an obligatory carrier.,协方差矩阵 12的分类,如果协方差矩阵不等,分配规则如下.,11.4 评估分类函数,判断分类方法优劣的一个重要方法就是计算其误判率或错分率。总错分率为通过适当选择 R 1 and R 2得到该量的最小值,称为 最优失误率(OER).,其中R1和R2有(11-7)中的(b)确定。,样本分类函数的效果可以用真实失误率来评估(AER),一般来说AER不能计算,因为它依赖未知的密度函数,但是用表现失误率(APER)来替代,定义为训练样本中被错分的比率。,11.5 多总体分类,1.最小期望错分代价法。Let f i(X)be the density associated with population i,i=1,2,g.Let p i=the prior probability of population i,i=1,2,g.c(k|i)=the cost of allocating an item to k when it belongs to i,for k,i=1,2,g.Rk=the set of xs classified as k.,Result 11.5.能使 ECM(11-37)达到最小的分类域,可以通过将 x 分配给 k,k=1,2,。,g,如果下式最小:不止一个最小,则将 x 分配给其中任意满足要求的总体.证明见张尧庭等(209),正态总体分类,(1)协方差不等时,二次判别函数,分配给第i个总体,(11.46),(2)协方差矩阵相等时,相等时,判别得分为,因此可以定义线性判别得分,11.6 Fishers 判别函数,Fishers idea-把多元变量 x 变成一元变量 y,使得 ys 能尽量分类总体 1 and 2,A fixed linear combination of the xs takes the values y11,y12,y1n,for the observations from 1 and the values y21,y22,y2n for the observations from 2 The separation of these two sets of univariate ys is assessed in terms of the difference between y1 and y2 expressed in standard deviation units.That is,其平方后,分子相当于组间差,组内差,67,典型判别函数,典型判别函数的思想由 Fisher首次提出。典型判别分析通过对原始变量做线性变换来构建新变量。构建的典型变量使得它们包含原始变量集中有用的信息。换句话说,它们类似主成分和因子分析方法,当然计算方法有所不同。,68,不考虑典型函数是否可以解释,其优点是它们可以简化实际数据的维数,从而使得数据可以可视化.典型函数允许研究人员开发简单的判别规则。,http:/www.ats.ucla.edu/stat/spss/dae/discrim.htm,69,典型分析的思想:,70,第一个典型判别函数,假设研究人员获得来自总体Gi的 ni 个样本,假设该总体服从分布为 Np(i,),for i=1,2,k.并假设这些总体具有相同的协方差矩阵,71,Let,那么组间的离差为 B:,72,组内变差为:,经典判别分析的思想是,对原始数据进行投影使得变化后的样本组间差别最大,组内差别最小,即使得比值最大。,73,可以证明,E-1B的最大特征值.a1 为 E-1B 对应的最大特征向量.线性组合y1=a1Tx 就是单个线性判别函数,其提供了总体之间的最大差异.,这里F可以用于检验两组之间的均值是否相同!,74,a1 is the largest eigenvalue,Proof(here we change some symbol),为什么V-1/2AV-1/2和V1A的特征根一样,因为AB和BA的非0特征根相同!,75,V-1A的特征向量就是要找的系数a,76,77,X的有效判别可以基于 a1Tx,a1T1,a1Tk,令 di=|a1Tx-a1Ti|如果 di.最小,则x应该分配给第i个总体,第二典型判别函数,y2=a2Tx,di2=(a1Tx-a1Ti)2+(a2Tx-a2Ti)2,Assign x to the population that gives the minimum value for di2.,Result 11.6,80,81,82,83,Determining the dimensionality of the cannonical space,The dimensionality of the cannonical space s is bounded above y the minimum of p and m-1.We can construct SCREE plots of the eigenvalues or consider what proportion of the total variability is being accounted for by each cannonical function and select enough to account for a large proportion of the total variability.,84,Example 7.3,Let data in iris become 1,2,3,then we use discriminant analysis Iris is grouping variable,85,Example 11.1,data gpa;infile T11-6.dat;input gpa gmat admit;proc discrim data=gpa pool=yes manova wcov pcov listerr crosslisterr;class admit;var gpa gmat;run;去掉先验概率http:/www.stat.lsu.edu/faculty/moser/exst7037/discrim.htmlProc CanDisc Data=Iris All Out=OIris;Class Species;Var y1 y2;Run;,86,87,88,89,90,91,92,93,This shows a test for homogeneity of the variance-covariance matrices for the three varieties.This test is significant and the hypothesis of equal would be rejected.The linear discriminant functions often work quite well even though the vaiance-c are unequal.If the prob of correct classification are high enough to satisfy the user,then the user should not be too concerned that he is using a linear discriminant rule rather than a quadritic rule.SPSS cannot.,94,95,We see the eigenvalues of W-1B,as well as statistical tests for determining the dimensionality of the cannonical space.For this example,both eigenvalues are significant(p=0.0000).The first accounts for 99.1%of total variability,so the second is not important.So,the means for these three varieties come close to lying on a straight line within the four-dim sample space.,96,Define standardized cannonical functions-these could be used on data that has been standardized to determine the projections of data points onto the cannonical space.,97,The first lists vectors that define unstandardized cannonical functions.For example,we could compute undstandardized cnnonical scores viaCan1=-.829*SL-1.5*SW+2.2*PL+2.8*PW-2.1The location of the three variety means in the unstandardized cannonical space are shown at the bottom.it is plot on the territorial map,their locations are given by the(*)on the plot.,98,99,100,Spss also locates the perpendicular bisectors between the variety means on this plot.SPSS calls this plot a territorial ma.Thess bisectors divide the cannonical space into three distinct regions.If the projection of a new data point falls into one of the regions,then the new points is closets to the means(*)in that region which determines the variety to which the observation would be classified.,101,102,This table provide a listing of how each iris plant in the data set would be classified by the discriminant rule developed.The column labeled ACTUAL GROUP identifies the variety from which the observation came;the Highest shows the variety to which the observation would be assigned by the discriminatnt rule.,103,The first column labeled P(G/D)is the posterior probability for the group to which the observation is assigned.This posterior prob is 0.885 for case 3.SPSS gives posterior prob for only the best group and the second best group.The Second P(D/G)can be ignored.The last column,labeled DISCRIM SCORES,give the locations of the projections of the observations to the cannonical space.A plot of these projections for all observations in the data set is shown on page 271.,104,105,This table give a summary of the classification results by the resubstitution method.Examination of this summary shows that all 50 of the variety 1 plants are reclassified into variety 1.48of the variety 2 are reclassified into varitey 2,49 of the variety 3 are reclassified into variety 3.Overall,98%of the observation are correctly classified.These classifictions are made by applying the rule to the data used to build the rule,and so they may overestimate the actual prob of correct classifiction.SPSS does not have an option for cross-vaidation of the data as was available in SAS.SPSS does allow us to create a holdout data set by using its select option.,106,Change as,We get,107,108,109,110,Page 271,111,The projections of the means of the three iris groups are also located by the asterisks on this plot.This plot is interesting in that we can see that variety 1 is quite distinct from varieties 2 and 3.Also there is very little overlap between varieties 2 and 3.So discriminantion should be quite good for this iris data.,2023/7/10,中国人民大学六西格玛质量管理研究中心,113,判别分析方法步骤及框图,判别分析的逻辑框图如下:,2023/7/10,中国人民大学六西格玛质量管理研究中心,114,判别分析方法步骤及框图,图4.1 判别分析步骤框图,判别分析步骤,一,收集数据二,计算先验概率三,检验协方差矩阵是否相同以确定是使用线性判别函数还是二次判别函数四,估计条件概率f(X|i)下的参数.五,计算判别函数六,使用交叉验证方法来估计错分率七,进行分配。,